BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//project/author//NONSGML v1.0//EN
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTEND:20230801T120000Z
UID:62ae13195cfa67626fb144741b917330-486
DTSTAMP:19700101T120010Z
DESCRIPTION:AVERAGE REWARD ACTOR-CRITIC WITH DETERMINISTIC POLICY SEARCH
URL;VALUE=URI:https://www.csa.iisc.ac.in/newweb/event/486/average-reward-actor-critic-with-deterministic-policy-search/
SUMMARY:The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $epsilon$-optimal stationary policy with a sample complexity of $Omega(epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
DTSTART:20230801T120000Z
END:VEVENT
END:VCALENDAR