Seminars

View all Seminars  |  Download ICal for this event

Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

Series: Bangalore Theory Seminars

Speaker: Arnab Maiti, University of Washington

Date/Time: May 28 11:00:00

Location: Online Talk (See Teams link below)

Abstract:
In this paper, we study the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary. Given a DAG G = (V, E) with a source node $v_s$ and a sink node $v_t$ let $X subseteq {0,1}^{|E|}$ denote the set of all paths from $v_s$ to $v_t$. At each round t, we select a path $x_t in X$ and receive bandit feedback on the loss $langle x_t, y_t rangle in [-1,1]$, where $y_t$ is an adversarially chosen loss vector. Our goal is to minimize regret with respect to the best path in hindsight over T rounds. We propose the first computationally efficient algorithm to achieve a near-minimax optimal regret bound of $widetilde{O}left(sqrt{|E| T log(|X|)}right) with high probability against any adaptive adversary, where $widetilde{O}(cdot)$ hides logarithmic factors in the number of edges $|E|$.

Our algorithm leverages a novel loss estimator and a centroid-based decomposition in a nontrivial manner to attain this regret bound. As an application, we show that our algorithm for DAGs provides state-of-the-art efficient algorithms for $m$-sets, extensive-form games, the Colonel Blotto game, shortest walks in directed graphs, hypercubes, and multi-task multi-armed bandits, achieving improved high-probability regret guarantees in all these settings.

Microsoft teams:

Link


We are grateful to the Kirani family (Link and the Walmart Center for Tech Excellence (Link for generously supporting this seminar series


Hosts: Rameesh Paul, KVN Sreenivas, Rahul Madhavan, Debajyoti Kar, Nirjhar Das