This code is the official implementation for the paper, Unveiling Privacy, Memorization, and Input Curvature Links. This code contains the experiments for the three links described in the paper.
ICML 2024, if you use this github repo consider citing our work
@inproceedings{
ravikumar2024unveiling,
title={Unveiling Privacy, Memorization, and Input Curvature Links},
author={Ravikumar, Deepak and Soufleri, Efstathia and Hashemi, Abolfazl and Roy, Kaushik},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=4dxR7awO5n}
}
Unveiling Privacy, Memorization, and Input Curvature Links ICML paper link. You may also be interested in our other memorization work "Memorization Through the Lens of Curvature of Loss Function Around Samples" (ICML 24 Spotlight) paper, code.
We recommend using conda as the environment manager. We provide the environment file to run our experiments in environment.yml
file. Please note this environment needs both PyTorch and TensorFlow installed and this is sometimes causes conflicts. Download the analysis_checkpoints
directory from here (link to be updated later) and place it in the root folder.
- Please download the datasets (CIFAR10, CIFAR100 and ImageNet) from respective sources.
- Please download
imagenet_index.npz
from Feldman and Zhang and place it inbuild_fz_imagenet/
- Use the
build_imagenet.py
in thebuild_fz_imagenet
directory to convert to TFRecord dataset. - Place the datasets in the following directory structure
+-- data_dir
| +-- CIFAR10
| |
| +-- CIFAR100
| |
| +-- imagenet
- Set the path in
config.json
'sdata_dir
variable. And set the path for TFRecord imagenet dataset inlibdata/indexed_tfrecords.py
line 49 and 51.
In our experiments we use pretrained models from Feldman and Zhang. Please download the models and set the path in config.json
's fz_model_dir
for both cifar100
and imagenet
.
You can also download pretrained private models here (link to be updated later) and set the config.json
's private_model_dir
variable.
Here we use CIFAR100 and ImageNet dataset. This experiment uses pretrained models from Feldman and Zhang. Ensure these models are downloaded and paths updated in config.json
.
To run CIFAR100 results:
- In
config.json
setfz_precomputed_score_dir
'scifar100
path. - Run the following command to calculate the curvature scores for cifar100 FZ models
python calc_curv_fz_models.py
- Run
analyze_cifar100_curv_fz.ipynb
to obtain the results presented in the paper.
To run ImageNet results:
- In
config.json
setfz_precomputed_score_dir
'simagenet
path. - Run the following command to calculate the curvature scores for imagenet FZ models. Start idx is the seed to start the curvature calculations from and stop idx is the seed to stop the curvature calculations. Max stop idx is 2000 since FZ models are available for imagenet with 2000 seeds.
python calc_curv_fz_imagenet_models.py --start_idx 0 --stop_idx 100
- Run
analyze_imagenet_curv_fz.ipynb
to obtain the results presented in the paper.
-
Training private models. To recreate our experiments run
sh scripts/create_privacy_vs_curve_dp_train_jobs.sh > train_privacy_vs_curve_dp.sh
to generate the script to run all the training code for all the seeds and all the target privacy budgets
-
Run the training
sh train_privacy_vs_curve_dp.sh
-
Next step is to compute the curvature scores. Make sure to set variable
private_precomputed_score_dir
's path forcifar100
andcifar10
inconfig.json
. -
Run
sh ./scripts/create_privacy_vs_curve_scorer_job.sh > private_curve_scorer.sh
to create the script file for running all the curvature calculations for all the seeds, target epsilon and datasets.
-
To run the curvature calculations run
sh private_curve_scorer.sh
-
Run
loss_v_priv.py
to get the effect of privacy on convergence loss bound result from the paper. -
Run
curv_privacy.ipynb
to get the curvature vs privacy results from the paper.
-
Training private models. To recreate our experiments run
sh scripts/create_privacy_vs_curve_dp_train_jobs.sh > train_privacy_vs_memorization_dp.sh
to generate the script to run all the training code for all the seeds and all the target privacy budgets
-
Run the training
sh train_privacy_vs_memorization_dp.sh
-
Run the command to get the results presented in the paper
python calc_private_memorization.py