Seminars
View all Seminars | Download ICal for this eventSequential learning in a stochastic multi armed bandit framework
Series: Bangalore Theory Seminars
Speaker: Sandeep Juneja, Senior Professor (Former Dean), School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai,
Date/Time: Apr 05 16:00:00
Location: CSA Seminar Hall (Room No. 254, First Floor)
Abstract:
The classic stochastic multi armed bandit framework involves finitely many unknown probability distributions that can be sequentially sampled to generate independent rewards. In this talk we consider two foundational problems: First one corresponds to sampling to minimize the expected regret, or equivalently, to maximize the expected total reward. The second one corresponds to the best arm identification, i.e., identifying the arm with the largest mean, or any other performance measure, using as few samples as possible while providing explicit probabilistically correct selection guarantees.
These problems form the bedrock of algorithms used in web design and advertising, recommendation systems, clinical trials and many other exciting applications. In this talk we review some of the popular algorithms used for these problems emphasizing the intuition underlying the elegant ideas. Technically speaking, these problems have been well studied under the restrictive assumption that arm distributions belong to a single parameter exponential family, that includes distributions such as Bernoulli and Gaussian with known variance. Under these settings, lower bounds on samples needed are developed using ideas from hypothesis testing, and algorithms are proposed that match the lower bound. We propose optimal algorithms that match the lower bounds even to a constant for general probability distributions under minimal restrictions. We further discuss how the proposed methodology leads to near optimal confidence intervals for distribution means. We discuss further enhancements in the presence of offline data that needs to be combined with online data. We further propose some new algorithms in the best arm identification setting that along with minimising sample complexity, are also computationally efficient.
Speaker Website Link
Organizers Note: The talk will be in two halves. We will have the first session from 16.00 to 17.00 followed by a short break for snacks, post which we have the second session.
Microsoft Teams Link:
Link
We are grateful to the Kirani family for generously supporting the theory seminar series
Hosts: Rahul Madhavan, Rameesh Paul, Aditya Subramanian and Aditya Abhay Lonkar