Home » Event

Seminars

View all Seminars | Download ICal for this event

Monotone and Conservative Policy Iteration Beyond the Tabular Case

Series: CSA Faculty Colloquium

Speaker: Gugan Thoppe, Assistant Professor, CSA, IISc

Date/Time: Oct 03 16:00:00

Location: CSA Auditorium, (Room No. 104, Ground Floor)

Abstract:
Reinforcement Learning (RL) faces a significant gap between theory and practice. Widely used methods�??DQN, TRPO, PPO, SAC, TD3�??trace back to vanilla and conservative Policy Iteration (PI) but are run with function approximation, where the PIs classic guarantees can fail, causing divergence, oscillations, or suboptimal convergence. To address this gap, I will introduce Reliable Policy Iteration (RPI), a PI variant that retains tabular-style guarantees under function approximation. RPI replaces Bellman�??error minimization with a Bellman-constrained evaluation, restoring monotonic improvement of value estimates that provably lower-bound the true return. The limit also partially satisfies the unprojected Bellman equation. Building on a generalized performance-difference lemma, I will also present a conservative RPI that extends conservative PIs safety to function approximation. Finally, I will share some initial model-free experiments where RPI reduces oscillations and hyperparameter sensitivity while matching or surpassing DQN, DDPG, PPO, and TD3 on classic control tasks. By restoring PIs core guarantees for arbitrary function classes, RPI offers a principled foundation for more reliable, next-generation RL.

This is joint work with Eshwar S.R., Aditya Gopalan, Gal Dalal, and Ananyabrata Barua.

Speaker Bio:
Gugan Thoppe is an Assistant ProfessorÂ in the Computer Science and Automation (CSA) department at the Indian Institute of Science (IISc). His research interests include reinforcement learning, distributed learning, and stochastic approximation.

Host Faculty: Prof. Sumit Kumar Mandal

Department of Computer Science and Automation

Seminars

Monotone and Conservative Policy Iteration Beyond the Tabular Case

Explore

Quick Links

Resources

Seminars Calendar