CSA Technical Report: IISc-CSA-TR-2005-6

Reinforcement Learning Based Algorithms For Average Cost Markov Decision Processes
Mohammed Shahid Abdulla and Shalabh Bhatnagar

IISc-CSA-TR-2005-6
(May 2005)

Available formats: [ps] [ps.gz]

Filed on May 25, 2005
Updated on May 25, 2005

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and averaged for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, a memory efficient implementation using a feature-vector representation of the state-space and TD(0) learning along the faster timescale is discussed. The TD(0) algorithm does not follow an on-line sampling of states but is observed to do well on on our setting. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are presented using the proposed algorithms. We show performance comparisons of our algorithms with the two-timescale actor-critic algorithms of (Konda & Borkar, 1999). Our algorithms exhibit more than an order of magnitude better performance over those of (Konda & Borkar, 1999).

Please bookmark this technical report as http://aditya.csa.iisc.ernet.in/TR/2005/6/.

Problems ? Contact techrep@csa.iisc.ernet.in
[Updated at 2009-10-22T06:42Z]

Reinforcement Learning Based Algorithms For Average Cost Markov Decision ProcessesMohammed Shahid Abdulla and Shalabh Bhatnagar

Reinforcement Learning Based Algorithms For Average Cost Markov Decision Processes
Mohammed Shahid Abdulla and Shalabh Bhatnagar