title | section | abstract | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Safe Linear Bandits over Unknown Polytopes |
Original Papers |
The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown \emph{roundwise} constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs between efficacy and smooth safety costs of SLBs over polytopes, and the role of aggressive {doubly-optimistic play} in avoiding the strong assumptions made by extant pessimistic-optimistic approaches. We first elucidate an inherent hardness in SLBs due the lack of knowledge of constraints: there exist ‘easy’ instances, for which suboptimal extreme points have large ‘gaps’, but on which SLB methods must still incur |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
gangrade24a |
0 |
Safe Linear Bandits over Unknown Polytopes |
1755 |
1795 |
1755-1795 |
1755 |
false |
Gangrade, Aditya and Chen, Tianrui and Saligrama, Venkatesh |
|
2024-06-30 |
Proceedings of Thirty Seventh Conference on Learning Theory |
247 |
inproceedings |
|