Seminars
View all Seminars | Download ICal for this eventMonotone and Conservative Policy Iteration Beyond the Tabular Case
Series: CSA Faculty Colloquium
Speaker: Gugan Thoppe, Assistant Professor, CSA, IISc
Date/Time: Oct 03 16:00:00
Location: CSA Auditorium, (Room No. 104, Ground Floor)
Abstract:
Reinforcement Learning (RL) faces a significant gap between theory and practice. Widely used methods??DQN, TRPO, PPO, SAC, TD3??trace back to vanilla and conservative Policy Iteration (PI) but are run with function approximation, where the PIs classic guarantees can fail, causing divergence, oscillations, or suboptimal convergence. To address this gap, I will introduce Reliable Policy Iteration (RPI), a PI variant that retains tabular-style guarantees under function approximation. RPI replaces Bellman??error minimization with a Bellman-constrained evaluation, restoring monotonic improvement of value estimates that provably lower-bound the true return. The limit also partially satisfies the unprojected Bellman equation. Building on a generalized performance-difference lemma, I will also present a conservative RPI that extends conservative PIs safety to function approximation. Finally, I will share some initial model-free experiments where RPI reduces oscillations and hyperparameter sensitivity while matching or surpassing DQN, DDPG, PPO, and TD3 on classic control tasks. By restoring PIs core guarantees for arbitrary function classes, RPI offers a principled foundation for more reliable, next-generation RL.
This is joint work with Eshwar S.R., Aditya Gopalan, Gal Dalal, and Ananyabrata Barua.
Speaker Bio:
Gugan Thoppe is an Assistant Professor in the Computer Science and Automation (CSA) department at the Indian Institute of Science (IISc). His research interests include reinforcement learning, distributed learning, and stochastic approximation.
Host Faculty: Prof. Sumit Kumar Mandal