Seminars
View all Seminars | Download ICal for this eventOn Policy Gradient Methods, Momentum, and Learning under Adversaries: Algorithms and Convergence Analysis
Series: Ph.D. Colloquium
Speaker: Swetha Ganesh, Ph.D (Engg.) student, Dept. of CSA
Date/Time: Jan 29 09:00:00
Location: CSA Auditorium, (Room No. 104, Ground Floor)
Faculty Advisor: Prof. Gugan C.M Thoppe
Abstract:
This thesis presents five contributions to the fields of reinforcement learning, learning under adversaries, and stochastic optimization, organized into three parts: (I) Average-Reward Reinforcement Learning (RL), (II) Learning in Adversarial Settings, and (III) Momentum in Stochastic Optimization.
Part I: Average-Reward Reinforcement Learning
Reinforcement Learning (RL), a subfield of machine learning, aims to train agents to develop optimal strategies for interacting with their environments. Among RL frameworks, the average-reward infinite-horizon formulation is particularly suited for optimizing long-term objectives. However, it remains underexplored compared to the widely studied discounted framework, particularly in the context of Policy Gradient (PG) methods. This thesis addresses key challenges in this setup:
1. Variance-Reduced Policy Gradient for Average-Reward RL: We analyze policy gradient methods with general policy parameterization for average-reward Markov Decision Processes (MDPs). By leveraging variance-reduction techniques, we achieve an expected regret bound of $Tilde{mathcal{O}}(sqrt{T})$, substantially improving upon the previous best bound of $Tilde{mathcal{O}}(T^{3/4})$.
2. Natural Actor-Critic with Multi-Level Monte Carlo: We propose a natural actor-critic algorithm with a linear critic and general parameterized policy. This method achieves the optimal global convergence rate of $mathcal{O}left(frac{1}{sqrt{T}}right)$ and eliminates the need for mixing time knowledge through bias reduction using Multi-Level Monte Carlo (MLMC).
Part II: Learning in Adversarial Settings
Distributed systems face significant challenges from random failures and adversarial attacks, stemming from hardware malfunctions, corrupted data, or malicious actions. Adversarial attacks, particularly those involving coordinated or system-informed adversaries, can severely disrupt learning processes. This thesis addresses two critical problems in this context:
3. Federated Reinforcement Learning Under Adversaries: We establish the first global sample complexity results for federated RL with general policy parameterization under adversarial settings, achieving optimal-order sample complexity.
4. Heterogeneous and Asynchronous Learning with Adversaries: We develop an algorithm for a distributed linear problem under adversarial settings, with applications to network tomography. Despite heterogeneity and asynchrony, the algorithm guarantees almost sure convergence to the true solution.
Part III: Momentum in Stochastic Optimization
Stochastic optimization involves maximizing or minimizing functions using noisy gradient estimates, with applications spanning diverse fields. While momentum-based methods like Heavy Ball (HB) and Nesterovs Accelerated Gradient (NAG) accelerate gradient descent for deterministic quadratic problems, their efficacy in stochastic settings remains unclear. This thesis investigates their performance:
5. Momentum in Stochastic Optimization: We show that Stochastic Heavy Ball (SHB) and Accelerated Stochastic Gradient (ASG) methods do not outperform vanilla Stochastic Gradient Descent (SGD) in terms of sample complexity, even in simple stochastic settings. This is demonstrated by deriving a lower bound on the sample complexity of SHB and ASG and showing that SGD meets this bound.
Microsoft teams:
https://teams.microsoft.com/l/meetup-join/19%3ameeting_YjQxNDhmMjYtNTc0OS00NzA2LWI2OTMtMzQ3YzBiMzFhMWZl%40thread.v2/0?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%22adc1e56f-56ee-4d24-873f-341c97ae782a%22%7d