View all Seminars  |  Download ICal for this event

Multi-agent Natural Actor-Critic Reinforcement Learning

Series: Department Seminar

Speaker: Prashant Trivedi

Date/Time: Nov 29 16:00:00

Location: Microsoft Teams, Link: Online

Faculty Advisor: Nandyala Hemachandra

Both single-agent and multi-agent actor-critic algorithms are an important class of Reinforcement Learning algorithms. In this work, we propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms. The agents objective is to collectively learn a joint policy that maximizes the sum of averaged long-term returns of these agents. In the absence of a central controller, agents communicate the information to their neighbors via a time-varying communication network while preserving privacy. These are decentralized algorithms as each agent picks actions using local reward information and limited information from other agents. We show the convergence of these algorithms using stochastic approximations approach; these algorithms use linear function approximations. We use the Fisher information matrix to obtain the natural gradients. The Fisher information matrix captures the curvature of the Kullback-Leibler (KL) divergence between polices at successive iterates. We also show that the gradient of this KL divergence between policies of successive iterates is proportional to the objective functions gradient. Our MAN algorithms indeed use this representation of the objective functions gradient. Under certain conditions on the Fisher information matrix, we prove that at each iterate, the optimal value via MAN algorithms can be better than that of the multi-agent actor-critic (MAAC) algorithm using the standard gradients. To validate the usefulness of our proposed algorithms, we present extensive computational experiments. First, we implement all the 3 MAN algorithms on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic 15 agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. We attribute the better performance of the MAN algorithms to their use of the above representation of the objective function.
Event will be held online on Microsoft Teams

Speaker Bio:
Prashant is senior research scholar at industrial engineering and operations research IIT Bombay. He is interested and working in machine learning and related problems including interpretable feature subset selection, natural gradients based algorithms in Multi-agent reinforcement learning.

Host Faculty: Gugan Thoppe