BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//project/author//NONSGML v1.0//EN
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTEND:20211129T120000Z
UID:80622a381441ff6934a0ad1e85df4836-221
DTSTAMP:19700101T120016Z
DESCRIPTION:Multi-agent Natural Actor-Critic Reinforcement Learning
URL;VALUE=URI:https://www.csa.iisc.ac.in/newweb/event/221/multi-agent-natural-actor-critic-reinforcement-learning/
SUMMARY:Both single-agent and multi-agent actor-critic algorithms are an important class of Reinforcement Learning algorithms. In this work, we propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms. The agents objective is to collectively learn a joint policy that maximizes the sum of averaged long-term returns of these agents. In the absence of a central controller, agents communicate the information to their neighbors via a time-varying communication network while preserving privacy. These are decentralized algorithms as each agent picks actions using local reward information and limited information from other agents. We show the convergence of these algorithms using stochastic approximations approach; these algorithms use linear function approximations. We use the Fisher information matrix to obtain the natural gradients. The Fisher information matrix captures the curvature of the Kullback-Leibler (KL) divergence between polices at successive iterates. We also show that the gradient of this KL divergence between policies of successive iterates is proportional to the objective functions gradient. Our MAN algorithms indeed use this representation of the objective functions gradient. Under certain conditions on the Fisher information matrix, we prove that at each iterate, the optimal value via MAN algorithms can be better than that of the multi-agent actor-critic (MAAC) algorithm using the standard gradients. To validate the usefulness of our proposed algorithms, we present extensive computational experiments. First, we implement all the 3 MAN algorithms on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic 15 agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. We attribute the better performance of the MAN algorithms to their use of the above representation of the objective function.
&lt;br&gt;
Event will be held online on Microsoft Teams
&lt;a href=&quot;https://teams.microsoft.com/l/meetup-join/19%3ae7d9a7fa7e3f41478d03d76eb63af61b%40thread.tacv2/1626097067191?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%22adc1e56f-56ee-4d24-873f-341c97ae782a%22%7d&quot;&gt;https://teams.microsoft.com/l/meetup-join/19%3ae7d9a7fa7e3f41478d03d76eb63af61b%40thread.tacv2/1626097067191?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%22adc1e56f-56ee-4d24-873f-341c97ae782a%22%7d&lt;/a&gt;
DTSTART:20211129T120000Z
END:VEVENT
END:VCALENDAR