title

section

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Safe Linear Bandits over Unknown Polytopes

Original Papers

The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown \emph{roundwise} constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs between efficacy and smooth safety costs of SLBs over polytopes, and the role of aggressive {doubly-optimistic play} in avoiding the strong assumptions made by extant pessimistic-optimistic approaches. We first elucidate an inherent hardness in SLBs due the lack of knowledge of constraints: there exist ‘easy’ instances, for which suboptimal extreme points have large ‘gaps’, but on which SLB methods must still incur $\Omega(\sqrt{T})$ regret or safety violations, due to an inability to resolve unknown optima to arbitrary precision. We then analyse a natural doubly-optimistic strategy for the safe linear bandit problem, \textsc{doss}, which uses optimistic estimates of both reward and safety risks to select actions, and show that despite the lack of knowledge of constraints or feasible points, \textsc{doss} simultaneously obtains tight instance-dependent $O(\log^2 T)$ bounds on efficacy regret, and $\widetilde O(\sqrt{T})$ bounds on safety violations, thus attaining near Pareto-optimality. Further, when safety is demanded to a finite precision, violations improve to $O(\log^2 T).$ These results rely on a novel dual analysis of linear bandits: we argue that \textsc{doss} proceeds by activating noisy versions of at least $d$ constraints in each round, which allows us to separately analyse rounds where a ‘poor’ set of constraints is activated, and rounds where ‘good’ sets of constraints are activated. The costs in the former are controlled to $O(\log^2 T)$ by developing new dual notions of gaps, based on global sensitivity analyses of linear programs, that quantify the suboptimality of each such set of constraints. The latter costs are controlled to $O(1)$ by explicitly analysing the solutions of optimistic play.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

gangrade24a

0

Safe Linear Bandits over Unknown Polytopes

1755

1795

1755-1795

1755

false

Gangrade, Aditya and Chen, Tianrui and Saligrama, Venkatesh

given	family
Aditya	Gangrade

given	family
Tianrui	Chen

given	family
Venkatesh	Saligrama

2024-06-30

Proceedings of Thirty Seventh Conference on Learning Theory

247

inproceedings

date-parts

2024

6

30

https://proceedings.mlr.press/v247/gangrade24a/gangrade24a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-06-30-gangrade24a.md

2024-06-30-gangrade24a.md

Files

2024-06-30-gangrade24a.md

Latest commit

History

2024-06-30-gangrade24a.md

File metadata and controls