Code for the paper "The Effects of Ensembling on Long-Tailed Data" where we perform a systematic comparison between logit and probability ensembling for a variety of models trained on balanced and imbalanced datasets.
- Adding more ensemble members continues to improve performance on imbalanced datasets.
- No difference between logit and probability ensembles across a variety of balanced datasets.
- There are differences between logit and probability ensembles on imbalanced datasets depending on the ensemble diversity and dependency.
@inproceedings{
buchanan2023the,
title={The Effects of Ensembling on Long-Tailed Data},
author={E. Kelly Buchanan and Geoff Pleiss and Yixin Wang and John Patrick Cunningham},
booktitle={NeurIPS 2023 Workshop Heavy Tails in Machine Learning},
year={2023}
}
Installation instructions in docs/README.md: docs/README.md
- Train resnet32 model on CIFAR10 dataset
python scripts/run.py --config-name="run_gpu_cifar10"
- Train models on CIFAR10LT dataset across multiple losses
wandb sweep experiments/compare_loss/train_gpu_loss_cifar10.yaml
- Train additional models on CIFAR10LT.
wandb sweep experiments/compare_loss/train_gpu_loss_cifar10_largeM.yaml
Wandb Experiment | parameters | comments |
---|---|---|
nggmmw4m , 0itowy8a, d4s9wp4v | train resnet32 and resnet110 models on CIFAR10-LT using multiple losses and for different seeds. (IMBALANCECIFAR10) | models trained using balanced softmax loss have best performance |
9hwaytks, gv4bucon | train resnet32_cfa and resnet_110 on CIFAR100-LT using multiple losses and for difference seeds. (IMBALANCECIFAR100Aug) | models trained using balanced softmax loss have best performance |
- Fig: Ensemble size vs ensemble type across multiple losses
python scripts/vis_scripts/plot_results_metric_M.py --config-path="../../results/configs/comparison_baseline_cifar10lt" --config-name="compare_M"
- Table: Ensemble performance of models trained on CIFAR10-LT and CIFAR100-LT:
python scripts/compare_all_results.py --config-path="../results/configs/comparison_baseline_cifar10lt" --config-name="default"
python scripts/compare_all_results.py --config-path="../results/configs/comparison_baseline_cifar100lt" --config-name="default"
- Fig: Class ID vs avg. Disagreement:
python scripts/vis_scripts/plot_results_pclass.py
- Fig: Class ID vs diversity/dependency:
python scripts/vis_scripts/plot_results_dkl_diff.py
- Fig: performance of logit and probability ensembles on balanced datasets.
python scripts/vis_scripts/plot_single_metric_xy.py --datasets=base --metric=error
- Balanced Meta Softmax: github.com/jiawei-ren/BalancedMetaSoftmax-Classification