SeminarsView all Seminars | Download ICal for this event
A Multi-Policy Reinforcement Learning Framework for Autonomous Navigation
Series: M.Tech (Research) Thesis Defence - ONLINE
Speaker: Mr. Rajarshi Banerjee M.Tech (Research) Student Dept. of CSA
Date/Time: Oct 08 10:00:00
Location: Microsoft Teams - ON-LINE
Faculty Advisor: Prof. Ambedkar Dukkipati
Reinforcement Learning (RL) is the process of training an agent to take a sequence of actions with the prime objective of maximizing rewards it obtains from an environment. Deep RL is simply using the same approach where a deep neural network parameterizes the policy. Temporal abstraction in RL is learning useful and generalizable skills, which are often necessary for solving complex tasks in various environments of practical interest. One such domain is the longstanding problem of autonomous vehicle navigation. In this work, we focus on learning complex skills in such environments where the agent has to learn a high-level policy by leveraging multiple skills inside an environment that presents various challenges.
Multi-policy reinforcement learning algorithms like the Options Critic Framework require an exorbitant amount of time for converging to policies. Even when they do, there is a broad tendency for the policy over options to choose a single sub-policy exclusively, thus rendering the other policies moot. In contrast, our approach combines an iterative approach to complement previously learned policies.
To conduct the experiments, a custom simulated 3D navigation environment was developed where the agent is a vehicle that has to learn a policy by which it can avoid a collision. This is complicated because, in some scenarios, the agent needs to infer certain abstract meaning from the environment to make sense of it while learning from a reward signal that becomes increasingly sparse.
In this thesis, we introduce the `Stay Alive` approach to learn such skills by sequentially adding them into an overall set without using an overarching hierarchical policy where the agents objective is to prolong the episode for as long as possible. The general idea behind our approach comes from the fact that both animals and human beings learn meaningful skills in previously acquired skills to better adapt to their respective environments.
We compare and report our results on the navigation environment and the Atari Riverraid environment with state-of-the-art RL algorithms and show that our approach outperforms the prior methods.
Microsoft teams link: