Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap uncertainty details #13

Open
e-pet opened this issue Nov 18, 2022 · 1 comment
Open

Bootstrap uncertainty details #13

e-pet opened this issue Nov 18, 2022 · 1 comment

Comments

@e-pet
Copy link

e-pet commented Nov 18, 2022

Hi!

First of all, thanks for the excellent package, and in particular also for still actively maintaining it! :-)

I have some questions regarding the bootstrapping-based uncertainty quantification. When I call get_calibration_error_uncertainties, it calls bootstrap_uncertainty with the functional get_calibration_error(probs, labels, p, debias=False, mode=mode).

bootstrap_uncertainty will then roughly do this:

    plugin = functional(data)
    bootstrap_estimates = []
    for _ in range(num_samples):
        bootstrap_estimates.append(functional(resample(data)))
    return (2*plugin - np.percentile(bootstrap_estimates, 100 - alpha / 2.0),
            2*plugin - np.percentile(bootstrap_estimates, 50),
            2*plugin - np.percentile(bootstrap_estimates, alpha / 2.0))

Questions:

  1. Why is debias=False in the call to get_calibration_error? I would like UQ for the unbiased (L2) error estimate?
  2. How/why is "2*plugin - median(bootstrap_estimates)" a good estimate of the median? And similarly for the lower/upper quantiles?
  3. In get_calibration_error_uncertainties, it says "When p is not 2 (e.g. for the ECE where p = 1), [the median]
    can be used as a debiased estimate as well." - why would that be true / what exactly do you mean by it...?

I guess what I am really asking is: what's the reasoning behind the approach you chose, and is it described somewhere? :-)

@AnanyaKumar
Copy link
Member

Just saw this (sorry!)

  1. debias is False, because bootstrap will do a debiasing.
  2. See https://www.stat.cmu.edu/~larry/=stat705/Lecture20.pdf for more details. While it might look strange, this is the correct and more reliable way to do bootstrap. I could probably write a couple of pages to explain it in detail, but hopefully Larry's notes give a sense of this. Maybe https://stats.stackexchange.com/questions/488217/use-bootstrap-mean-to-remove-bias-from-the-statistic#:~:text=Instead%2C%20by%20comparing%20the%20bootstrap,statistic%20and%20the%20bootstrap%20mean. could be helpful, but I haven't checked it too carefully for correctness
  3. See our paper https://arxiv.org/abs/1909.10155 for why naive plugin estimators are biased. Bootstrap can debias estimates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants