BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//project/author//NONSGML v1.0//EN
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTEND:20220822T120000Z
UID:014e5922b259997ba2400bd7a99cd164-317
DTSTAMP:19700101T120015Z
DESCRIPTION:Value-based RL with function approximation and Îµ-greedy exploration: a differential inclusion analysis
URL;VALUE=URI:https://www.csa.iisc.ac.in/newweb/event/317/value-based-rl-with-function-approximation-and-i%c2%b5-greedy-exploration-a-differential-inclusion-analysis/
SUMMARY:The value-based method of Q-learning with $epsilon$-greedy exploration is one of the most widely used Reinforcement Learning (RL) algorithms. While its tabular form converges to the optimal Q-function under mild conditions, the behavior of its function approximation variant is quite mysterious. Sometimes, the tactic of function approximation with greedy exploration appears to speed up learning. However, at other times, it seems to cause complex behaviors such as i.) instability, ii.)  policy oscillation and chattering, iii.) multiple attractors, and iv.)  worst policy convergence. Accordingly, a formal recipe to explain these phenomena has been a long-standing open problem (Sutton, 1999). In this talk, we shall provide the first pathway, based on differential inclusions, to systematically identify and explain the range of limiting phenomena that an approximate value-based RL method with greedy exploration can exhibit, thereby answering this open question. 

This talk is based on our recent work titled ``Approximate Q-learning and SARSA(0) under the Ïµ-greedy Policy: a Differential Inclusion Analysis
DTSTART:20220822T120000Z
END:VEVENT
END:VCALENDAR