Seminars
View all Seminars | Download ICal for this eventValue-based RL with function approximation and ε-greedy exploration: a differential inclusion analysis
Series: Bangalore probability seminar - https://www.isibang.ac.in/~d.yogesh/BPS.html (Second Talk)
Speaker: Aditya Gopalan (ECE, IISc, Bengaluru)
Date/Time: Aug 22 15:00:00
Location: CSA Seminar Hall (Room No. 254, First Floor)
Abstract:
The value-based method of Q-learning with $epsilon$-greedy exploration is one of the most widely used Reinforcement Learning (RL) algorithms. While its tabular form converges to the optimal Q-function under mild conditions, the behavior of its function approximation variant is quite mysterious. Sometimes, the tactic of function approximation with greedy exploration appears to speed up learning. However, at other times, it seems to cause complex behaviors such as i.) instability, ii.) policy oscillation and chattering, iii.) multiple attractors, and iv.) worst policy convergence. Accordingly, a formal recipe to explain these phenomena has been a long-standing open problem (Sutton, 1999). In this talk, we shall provide the first pathway, based on differential inclusions, to systematically identify and explain the range of limiting phenomena that an approximate value-based RL method with greedy exploration can exhibit, thereby answering this open question.
This talk is based on our recent work titled ``Approximate Q-learning and SARSA(0) under the ϵ-greedy Policy: a Differential Inclusion Analysis
Host Faculty: Prof. Gugan Thoppe and Prof. Aditya Gopalan.