Skip to content

Latest commit

 

History

History
53 lines (53 loc) · 2.23 KB

2024-06-30-guo24a.md

File metadata and controls

53 lines (53 loc) · 2.23 KB
title section abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework
Original Papers
This paper studies the problem of stochastic constrained contextual bandits (CCB) under general realizability condition where the expected rewards and costs are within general function classes. We propose LOE2D, a Lyapunov Optimization Based Estimation to Decision framework with online regression oracles for learning reward/constraint. LOE2D establishes $\Tilde O(T^{\frac{3}{4}}U^{\frac{1}{4}})$ regret and constraint violation, which can be further refined to $\Tilde O(\min\{\sqrt{TU}/\varepsilon^2, T^{\frac{3}{4}}U^{\frac{1}{4}}\})$ when the Slater condition holds in the underlying offline problem with the Slater “constant” $ \varepsilon=\Omega(\sqrt{U/T}),$ where $U$ denotes the error bounds of online regression oracles. These results improve LagrangeCBwLC in two aspects: i) our results hold without any prior information while LagrangeCBwLC requires the knowledge of Slater constant to design a proper learning rate; ii) our results hold when $\varepsilon=\Omega(\sqrt{U/T})$ while LagrangeCBwLC requires a constant margin $\varepsilon=\Omega(1).$ These improvements stem from two novel techniques: violation-adaptive learning in E2D module and multi-step Lyapunov drift analysis in bounding constraint violation. The experiments further justify LOE2D outperforms the baseline algorithm.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
guo24a
0
Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework
2204
2231
2204-2231
2204
false
Guo, Hengquan and Liu, Xin
given family
Hengquan
Guo
given family
Xin
Liu
2024-06-30
Proceedings of Thirty Seventh Conference on Learning Theory
247
inproceedings
date-parts
2024
6
30