SeminarsView all Seminars | Download ICal for this event
Algorithms for Challenges to Practical Reinforcement Learning
Series: Ph.D (Engg.) (Colloquium) - ONLINE
Speaker: Ms. Sindhu P R Ph.D (Engg.) student Dept. of CSA
Date/Time: Dec 01 14:30:00
Location: Microsoft Teams: ON-LINE
Faculty Advisor: Prof. Shalabh Bhatnagar
Reinforcement learning (RL) in real world applications faces major hurdles - the foremost being safety of the physical system controlled by the learning agent and the varying environment conditions in which the autonomous agent functions. A RL agent learns to control a system by exploring available actions. In some operating states, when the RL agent exercises an exploratory action, the system may enter unsafe operation, which can lead to safety hazards both for the system as well as for humans supervising the system. RL algorithms thus need to respect these safety constraints and must do so with limited available information. Additionally, RL autonomous agents learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL algorithms yield sub-optimal decisions.
This talk describes our algorithmic solutions to the challenges of safety and non-stationary environmental conditions in RL. In order to handle safety restrictions and facilitate safe exploration during learning, we propose a cross-entropy method based sample efficient learning algorithm. This algorithm is developed based on constrained optimization framework and utilizes limited information for the learning of feasible policies. Also, during the learning iterations, the exploration is guided in a manner that minimizes safety violations.
The goal of the second algorithm is to find a good policy for control when the latent model of the environment changes with time. To achieve this, the algorithm leverages a change point detection algorithm to monitor change in the statistics of the environment. The results from this statistical algorithm are used to reset learning of policies and efficiently control an underlying system. Both the proposed algorithms are tested numerically on benchmark problems in RL.
The second part of this talk will focus on application of RL to obstacle avoidance problem in UAV quadrotor. Obstacle avoidance in quadrotor aerial vehicle navigation brings in additional challenges in comparison to ground vehicles. This is because, an aerial vehicle needs to navigate across more types of obstacles - for e.g., objects like decorative items, furnishings, ceiling fans, sign-boards, tree branches, etc., are also potential obstacles for a quadrotor aerial vehicle. Thus, methods of obstacle avoidance developed for ground robots are clearly inadequate for UAV navigation. Our proposed method utilizes the relevant temporal information available from the ambient surroundings for this problem. This information is a sequence of monocular camera images collected by the quadrotor aerial vehicle. Our method adapts attention based deep Q networks combined with generative adversarial networks for this application. It improves efficiency of learning by inferring quadrotor maneuver decisions from temporal information of the ambient surroundings.
Microsoft Teams Meeting Link: