Seminars

View all Seminars  |  Download ICal for this event

Bandit Algorithms: Fairness, Welfare and Applications in Causal Inference

Series: M.Tech (Research) Colloquium

Speaker: Ayush Sawarni

Date/Time: Feb 27 16:00:00

Location: CSA Seminar Hall (Room No. 254, First Floor)

Faculty Advisor: Prof. Siddharth Barman

Abstract:
We study regret in online learning from a welfarist perspective and explore an application
of bandit algorithms for causal inference. We introduce Nash regret, which measures the
difference between the optimal action choices and the algorithm??s performance in terms
of the Nash social welfare function. By providing bounds on Nash regret, we establish
principled fairness guarantees for online learning algorithms. We investigate different
online learning settings and derive tight bounds on Nash regret. Furthermore, we study
the problem of finding optimal interventions in causal graphs by providing theoretical
guarantees within the context of the causal bandit problem.

In the first part, we focus on the classic multi-armed bandit (MAB) framework and develop
an algorithm that achieves a tight Nash regret bound. Specifically, given a horizon of play T,
our algorithm achieves a Nash regret of O(??k logTT?), where k represents the number of arms
in the MAB instance. The lower bound on average regret applies to Nash regret as well,
making our guarantee essentially tight. Additionally, we propose an anytime algorithm with
a Nash regret guarantee of O(??k logTT?logT).

In the second part, we study the stochastic linear bandits problem with non-negative,
ν-sub Poisson rewards. We present an algorithm that achieves a Nash regret bound of
O(??dνT?log(T|X|)), where X denotes the set of arms in ambient dimension d and T represents
the number of rounds. Furthermore, for linear bandit instances with potentially infinite
arm sets, we derive a Nash regret upper bound of O(d54? ν12? logT??T?). Our algorithm builds
upon the successive elimination method and incorporates novel techniques such as tailored
concentration bounds and sampling via the John ellipsoid in conjunction with the
Kiefer-Wolfowitz optimal design.

In the third part, we investigate Nash regret in the context of online concave optimization
and the Expert??s problem, assuming adversarially chosen reward functions. Our algorithm
achieves Nash Regret of O(log NT?) for the Expert??s problem where N is the number of experts.
We provide a lower bound for this setting that is essentially tight with respect to the
upper bound. Additionally, for online concave optimization, we provide a Nash regret
guarantee of O(dlogTT?), where d denotes the ambient dimension.

In the final part of this thesis, we focus on the causal bandit problem, which involves
identifying near-optimal interventions in a causal graph. Previous works have provided
a bound of O?(N/??T) for simple regret for causal graphs with N vertices, constant in-degree,
and Bernoulli random variables. In this work, we present a new approach for exploration using
covering interventions. This allows us to achieve a significant improvement and provide a
tighter simple regret guarantee of O?(??N/T). Furthermore, we extend our algorithm to handle
the most general case of causal graphs with unobservable variables.