Adaptivity and Confounding in Multi-armed Bandit Experiments

Published: 01 January 1970
on channel: Simons Institute
976
17

Daniel Russo (Columbia University)
https://simons.berkeley.edu/talks/tbd...
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Multi-armed bandit algorithms can offer enormous efficiency benefits in problems where learning to make effective decisions requires careful experimentation. They achieve this through adaptivity: early feedback is used to identify competitive parts of the decision space and future experimentation effort is focused there. Unfortunately, due to this adaptivity, these algorithms risk confounding in problems where nonstationary contexts influence performance. As a result, many practitioners resort to non-adaptive randomized experimentation, providing robustness but foregoing efficiency benefits. We develop a new model to study this issue and propose deconfounded Thompson sampling, which involves a simple modification to one of leading multi-armed bandit algorithms. We argue that this method strikes a delicate balance, allowing one to build in robustness to nonstationarity while, when possible, preserving the efficiency benefits of adaptivity.


Watch video Adaptivity and Confounding in Multi-armed Bandit Experiments online without registration, duration hours minute second in high quality. This video was added by user Simons Institute 01 January 1970, don't forget to share it with your friends and acquaintances, it has been viewed on our site 976 once and liked it 17 people.