Adaptivity and Confounding in Multi-armed Bandit Experiments

Опубликовано: 01 Январь 1970
на канале: Simons Institute

976

Daniel Russo (Columbia University)
https://simons.berkeley.edu/talks/tbd...
Quantifying Uncertainty: Stochastic, Adversarial, and Beyond

Multi-armed bandit algorithms can offer enormous efficiency benefits in problems where learning to make effective decisions requires careful experimentation. They achieve this through adaptivity: early feedback is used to identify competitive parts of the decision space and future experimentation effort is focused there. Unfortunately, due to this adaptivity, these algorithms risk confounding in problems where nonstationary contexts influence performance. As a result, many practitioners resort to non-adaptive randomized experimentation, providing robustness but foregoing efficiency benefits. We develop a new model to study this issue and propose deconfounded Thompson sampling, which involves a simple modification to one of leading multi-armed bandit algorithms. We argue that this method strikes a delicate balance, allowing one to build in robustness to nonstationarity while, when possible, preserving the efficiency benefits of adaptivity.

Смотрите видео Adaptivity and Confounding in Multi-armed Bandit Experiments онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Simons Institute 01 Январь 1970, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 976 раз и оно понравилось 17 людям.