Dynamic Regret Minimization for Bandits without Prior Knowledge

Published: 01 January 1970
on channel: Simons Institute

1,950

Chen-Yu Wei (University of Southern California)
https://simons.berkeley.edu/talks/tbd...
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

To evaluate the performance of a bandit learner in a changing environment, the standard notion of regret is insufficient. Instead, "dynamic regret" is a better measure that can evaluate the learner's ability to track the changes. How to achieve the optimal dynamic regret without prior knowledge on the number of times the environment changes had been open for a long time, and was recently resolved by Auer, Gajane, and Ortner in their COLT 2019 paper. We will discuss their consecutive sampling technique, which is rare in the bandit literature, and see how their idea can be elegantly generalized to a wide range of bandit/RL problems. Finally, we will discuss important open problems that remain in the area.

Watch video Dynamic Regret Minimization for Bandits without Prior Knowledge online without registration, duration hours minute second in high quality. This video was added by user Simons Institute 01 January 1970, don't forget to share it with your friends and acquaintances, it has been viewed on our site 1,950 once and liked it 17 people.