View all Seminars  |  Download ICal for this event

Algorithms for Challenges to Practical Reinforcement Learning

Series: Ph.D (Engg.) Thesis Defence - ON-LINE

Speaker: Ms. Sindhu P R Ph.D (Engg.) Student Dept. of CSA

Date/Time: Mar 24 16:00:00

Location: Microsoft Teams - ON-LINE

Faculty Advisor: Prof. Shalabh Bhatnagar

Reinforcement learning (RL) in real world applications faces major hurdles - the foremost being safety of the physical system controlled by the learning agent and the varying environment conditions in which the autonomous agent functions. A RL agent learns to control a system by exploring available actions. In some operating states, when the RL agent exercises an exploratory action, the system may enter unsafe operation, which can lead to safety hazards both for the system as well as for humans supervising the system. RL algorithms thus need to respect these safety constraints and must do so with limited available information. Additionally, RL autonomous agents learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL algorithms yield sub-optimal decisions.
We describe algorithmic solutions to the challenges of safety and non-stationary environmental conditions in RL. In order to handle safety restrictions and facilitate safe exploration during learning, we propose a cross-entropy method based sample efficient learning algorithm. This algorithm is developed based on constrained optimization framework and utilizes limited information for the learning of feasible policies. Also, during the learning iterations, the exploration is guided in a manner that minimizes safety violations. The goal of the second algorithm is to find a good policy for control when the latent model of the environment changes with time. To achieve this, the algorithm leverages a change point detection algorithm to monitor change in the statistics of the environment. The results from this statistical algorithm are used to reset learning of policies and efficiently control an underlying system.
In the second part of talk, we describe the application of RL to obstacle avoidance problem in UAV quadrotor. Obstacle avoidance in quadrotor aerial vehicle navigation brings in additional challenges in comparison to ground vehicles. Our proposed method utilizes the relevant temporal information available from the ambient surroundings for this problem and adapts attention based deep Q networks combined with generative adversarial networks for this application.

Microsoft Teams Link:

Speaker Bio:

Host Faculty: