GitHub - DeepakTatachar/Privacy-Memorization-Curvature: Accepted at ICML 2024. This is the offical implementation for the paper "Unveiling Privacy, Memorization, and Input Curvature Links"

Code Repository for Unveiling Privacy, Memorization and Input Loss Curvature

This code is the official implementation for the paper, Unveiling Privacy, Memorization, and Input Curvature Links. This code contains the experiments for the three links described in the paper.

ICML 2024, if you use this github repo consider citing our work

@inproceedings{
    ravikumar2024unveiling,
    title={Unveiling Privacy, Memorization, and Input Curvature Links},
    author={Ravikumar, Deepak and Soufleri, Efstathia and Hashemi, Abolfazl and Roy, Kaushik},
    booktitle={Forty-first International Conference on Machine Learning},
    year={2024},
    url={https://openreview.net/forum?id=4dxR7awO5n}
}

Unveiling Privacy, Memorization, and Input Curvature Links ICML paper link. You may also be interested in our other memorization work "Memorization Through the Lens of Curvature of Loss Function Around Samples" (ICML 24 Spotlight) paper, code.

Setup the environment

We recommend using conda as the environment manager. We provide the environment file to run our experiments in environment.yml file. Please note this environment needs both PyTorch and TensorFlow installed and this is sometimes causes conflicts. Download the analysis_checkpoints directory from here (link to be updated later) and place it in the root folder.

Setup Datasets

Please download the datasets (CIFAR10, CIFAR100 and ImageNet) from respective sources.
Please download imagenet_index.npz from Feldman and Zhang and place it in build_fz_imagenet/
Use the build_imagenet.py in the build_fz_imagenet directory to convert to TFRecord dataset.
Place the datasets in the following directory structure

  +-- data_dir
  |     +-- CIFAR10
  |     |     
  |     +-- CIFAR100
  |     |     
  |     +-- imagenet

Set the path in config.json's data_dir variable. And set the path for TFRecord imagenet dataset in libdata/indexed_tfrecords.py line 49 and 51.

Setup Pretrained Models

In our experiments we use pretrained models from Feldman and Zhang. Please download the models and set the path in config.json's fz_model_dir for both cifar100 and imagenet.

You can also download pretrained private models here (link to be updated later) and set the config.json's private_model_dir variable.

Running Experiments for Link 1 (Memorization and Curvature)

Here we use CIFAR100 and ImageNet dataset. This experiment uses pretrained models from Feldman and Zhang. Ensure these models are downloaded and paths updated in config.json.

To run CIFAR100 results:

In config.json set fz_precomputed_score_dir's cifar100 path.
Run the following command to calculate the curvature scores for cifar100 FZ models
```
python calc_curv_fz_models.py
```
Run analyze_cifar100_curv_fz.ipynb to obtain the results presented in the paper.

To run ImageNet results:

In config.json set fz_precomputed_score_dir's imagenet path.
Run the following command to calculate the curvature scores for imagenet FZ models. Start idx is the seed to start the curvature calculations from and stop idx is the seed to stop the curvature calculations. Max stop idx is 2000 since FZ models are available for imagenet with 2000 seeds.
```
python calc_curv_fz_imagenet_models.py --start_idx 0 --stop_idx 100
```
Run analyze_imagenet_curv_fz.ipynb to obtain the results presented in the paper.

Running Experiments for Link 2 (Privacy and Curvature)

Training private models. To recreate our experiments run
```
sh scripts/create_privacy_vs_curve_dp_train_jobs.sh > train_privacy_vs_curve_dp.sh
```
to generate the script to run all the training code for all the seeds and all the target privacy budgets
Run the training
```
sh train_privacy_vs_curve_dp.sh
```
Next step is to compute the curvature scores. Make sure to set variable private_precomputed_score_dir's path for cifar100 and cifar10 in config.json.
Run
```
sh ./scripts/create_privacy_vs_curve_scorer_job.sh > private_curve_scorer.sh
```
to create the script file for running all the curvature calculations for all the seeds, target epsilon and datasets.
To run the curvature calculations run
```
sh private_curve_scorer.sh
```
Run loss_v_priv.py to get the effect of privacy on convergence loss bound result from the paper.
Run curv_privacy.ipynb to get the curvature vs privacy results from the paper.

Running Experiments for Link 3 (Memorization and Privacy)

Training private models. To recreate our experiments run
```
sh scripts/create_privacy_vs_curve_dp_train_jobs.sh > train_privacy_vs_memorization_dp.sh
```
to generate the script to run all the training code for all the seeds and all the target privacy budgets
Run the training
```
sh train_privacy_vs_memorization_dp.sh
```
Run the command to get the results presented in the paper
```
python calc_private_memorization.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Repository for Unveiling Privacy, Memorization and Input Loss Curvature

Setup the environment

Setup Datasets

Setup Pretrained Models

Running Experiments for Link 1 (Memorization and Curvature)

Running Experiments for Link 2 (Privacy and Curvature)

Running Experiments for Link 3 (Memorization and Privacy)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
build_fz_imagenet		build_fz_imagenet
libdata		libdata
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze_cifar100_curv_fz.ipynb		analyze_cifar100_curv_fz.ipynb
analyze_imagenet_curv_fz.ipynb		analyze_imagenet_curv_fz.ipynb
calc_curv_fz_imagenet_models.py		calc_curv_fz_imagenet_models.py
calc_curv_fz_models.py		calc_curv_fz_models.py
calc_private_memorization.py		calc_private_memorization.py
config.json		config.json
convert_tf_2_torch.py		convert_tf_2_torch.py
create_index_for_private_mem.py		create_index_for_private_mem.py
curv_privacy.ipynb		curv_privacy.ipynb
environment.yml		environment.yml
loss_v_priv.py		loss_v_priv.py
private_model_curve_scorer.py		private_model_curve_scorer.py
train_dp.py		train_dp.py
train_dp_top_k.py		train_dp_top_k.py

License

DeepakTatachar/Privacy-Memorization-Curvature

Folders and files

Latest commit

History

Repository files navigation

Code Repository for Unveiling Privacy, Memorization and Input Loss Curvature

Setup the environment

Setup Datasets

Setup Pretrained Models

Running Experiments for Link 1 (Memorization and Curvature)

Running Experiments for Link 2 (Privacy and Curvature)

Running Experiments for Link 3 (Memorization and Privacy)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages