Seminars

View all Seminars  |  Download ICal for this event

Novel Reinforcement Learning Algorithms and Applications to Hybrid Control Design Problems

Series: M.Tech (Research) Colloquium- ON-LINE

Speaker: Mr. Meet Pradhuman Gandhi M.Tech (Research) student Dept. of CSA

Date/Time: Jun 24 15:00:00

Location: Microsoft Teams - ON-LINE

Faculty Advisor: Prof. Shalabh Bhatnagar

Abstract:
The thesis is a compilation of two independent works.
<br>
In the first work, we develop novel weight assignment procedure, which helps us develop several schedule based algorithms.<br>
Learning the value function of a given policy from the data samples is an important problem in Reinforcement Learning.<br>
TD($lambda$) is a popular class of algorithms to solve this problem.<br>
However, the weight assigned to different $n$-step returns decreases exponentially with increasing $n$ in TD($lambda$).<br>
Here, we present a $lambda$-schedule procedure that allows flexibility in weight assignment to the different $n$-step returns.<br>
Based on this procedure, we propose an on-policy algorithm, TD($lambda$)-schedule, and an off-policy algorithm, TDC($lambda$)-schedule, respectively.<br>
We provide proofs of almost sure convergence for both algorithms under a general Markov noise framework as well as present the results of experiments where these algorithms are seen to show improved performance.
<br>
In the second work, we design hybrid control policies for hybrid systems whose mathematical models are unknown.<br>
Our contributions are threefold.<br>
First, we propose a framework for modelling the hybrid control design problem as a single Markov Decision Process (MDP).<br>
This result facilitates the application of off-the-shelf algorithms from Reinforcement Learning (RL) literature towards designing optimal control policies.<br>
Second, we model a set of benchmark examples of hybrid control design problem in the proposed MDP framework.<br>
Third, we adapt the recently proposed Proximal Policy Optimisation (PPO) algorithm for the hybrid action space and apply it to the above set of problems.<br>
It is observed that in each case the algorithm converges and finds the optimal policy.
<br>
Microsoft teams link:<br>
<a href="Link