Skip to content

Latest commit

 

History

History
59 lines (59 loc) · 2.73 KB

2024-06-30-kao24a.md

File metadata and controls

59 lines (59 loc) · 2.73 KB
title section abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation
Original Papers
When we have a univariate distribution that is symmetric around its mean, the mean can be estimated with a rate (sample complexity) much faster than $O(1/\sqrt{n})$ in many cases. For example, given univariate random variables $Y_1, \ldots, Y_n$ distributed uniformly on $[\theta_0 - c, \theta_0 + c]$, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ maximizes likelihood and has expected error $\mathbb{E}\bigl| \theta_0 - \frac{Y_{(n)}+Y_{(1)}}{2} \bigr| \leq 2c/n$, which is optimal and much lower than the error rate $O(1/\sqrt{n})$ of the sample mean. What the optimal rate is depends on the distribution and it is generally attained by the maximum likelihood estimator (MLE). However, MLE requires exact knowledge of the underlying distribution; if the underlying distribution is \emph{unknown}, it is an open question whether an estimator can adapt to the optimal rate. In this paper, we propose an estimator of the symmetric mean $\theta_0$ with the following properties: it requires no knowledge of the underlying distribution; it has a rate no worse than $1/\sqrt{n}$ in all cases (assuming a finite second moment) and, when the underlying distribution is compactly supported, our estimator can attain a rate of $n^{-\frac{1}{{\alpha}}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by minimizing the $L_\gamma$-loss with respect to the data, for a power $\gamma \geq 2$ chosen in a data-driven way – by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $L_\gamma$ loss function with a data-driven $\gamma$ in certain settings.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
kao24a
0
Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation
2795
2839
2795-2839
2795
false
Kao, Yu-Chun and Xu, Min and Zhang, Cun-Hui
given family
Yu-Chun
Kao
given family
Min
Xu
given family
Cun-Hui
Zhang
2024-06-30
Proceedings of Thirty Seventh Conference on Learning Theory
247
inproceedings
date-parts
2024
6
30