Skip to content

Evaluating ensemble performance in long-tailed datasets (Neurips 2023 Heavy Tails Workshop)

License

Notifications You must be signed in to change notification settings

ekellbuch/longtail_ensembles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Effects of Ensembling on Long-Tailed Data

Code for the paper "The Effects of Ensembling on Long-Tailed Data" where we perform a systematic comparison between logit and probability ensembling for a variety of models trained on balanced and imbalanced datasets.

Findings:

  • Adding more ensemble members continues to improve performance on imbalanced datasets.
  • No difference between logit and probability ensembles across a variety of balanced datasets.
  • There are differences between logit and probability ensembles on imbalanced datasets depending on the ensemble diversity and dependency.

Fig 1: Comparison between logit and probability ensembles of models trained on CIFAR10-LT

Table 3: Ensembles outperform common approaches to handle long-tails

@inproceedings{
buchanan2023the,
title={The Effects of Ensembling on Long-Tailed Data},
author={E. Kelly Buchanan and Geoff Pleiss and Yixin Wang and John Patrick Cunningham},
booktitle={NeurIPS 2023 Workshop Heavy Tails in Machine Learning},
year={2023}
}

Installation instructions in docs/README.md: docs/README.md

Experiments:

  1. Train resnet32 model on CIFAR10 dataset
python scripts/run.py --config-name="run_gpu_cifar10"
  1. Train models on CIFAR10LT dataset across multiple losses
wandb sweep experiments/compare_loss/train_gpu_loss_cifar10.yaml
  1. Train additional models on CIFAR10LT.
wandb sweep experiments/compare_loss/train_gpu_loss_cifar10_largeM.yaml

Paper Experiments

Wandb Experiment parameters comments
nggmmw4m , 0itowy8a, d4s9wp4v train resnet32 and resnet110 models on CIFAR10-LT using multiple losses and for different seeds. (IMBALANCECIFAR10) models trained using balanced softmax loss have best performance
9hwaytks, gv4bucon train resnet32_cfa and resnet_110 on CIFAR100-LT using multiple losses and for difference seeds. (IMBALANCECIFAR100Aug) models trained using balanced softmax loss have best performance

Reproduce paper tables and figures:

  • Fig: Ensemble size vs ensemble type across multiple losses
python scripts/vis_scripts/plot_results_metric_M.py --config-path="../../results/configs/comparison_baseline_cifar10lt" --config-name="compare_M"
  • Table: Ensemble performance of models trained on CIFAR10-LT and CIFAR100-LT:
python scripts/compare_all_results.py --config-path="../results/configs/comparison_baseline_cifar10lt" --config-name="default"
python scripts/compare_all_results.py --config-path="../results/configs/comparison_baseline_cifar100lt" --config-name="default"
  • Fig: Class ID vs avg. Disagreement:
python scripts/vis_scripts/plot_results_pclass.py 
  • Fig: Class ID vs diversity/dependency:
python scripts/vis_scripts/plot_results_dkl_diff.py 
  • Fig: performance of logit and probability ensembles on balanced datasets.
python scripts/vis_scripts/plot_single_metric_xy.py --datasets=base --metric=error

References:

Releases

No releases published

Packages

No packages published

Languages