BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//project/author//NONSGML v1.0//EN
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTEND:20210624T120000Z
UID:898bd5f98b13385a5c853f012ed3de7f-160
DTSTAMP:19700101T120015Z
DESCRIPTION:Novel Reinforcement Learning Algorithms and Applications to Hybrid Control Design Problems
URL;VALUE=URI:https://www.csa.iisc.ac.in/newweb/event/160/novel-reinforcement-learning-algorithms-and-applications-to-hybrid-control-design-problems/
SUMMARY:The thesis is a compilation of two independent works.
&lt;br&gt;
In the first work, we develop novel weight assignment procedure, which helps us develop several schedule based algorithms.&lt;br&gt;
Learning the value function of a given policy from the data samples is an important problem in Reinforcement Learning.&lt;br&gt;
TD($lambda$) is a popular class of algorithms to solve this problem.&lt;br&gt;
However, the weight assigned to different $n$-step returns decreases exponentially with increasing $n$ in TD($lambda$).&lt;br&gt;
Here, we present a $lambda$-schedule procedure that allows flexibility in weight assignment to the different $n$-step returns.&lt;br&gt;
Based on this procedure, we propose an on-policy algorithm, TD($lambda$)-schedule, and an off-policy algorithm, TDC($lambda$)-schedule, respectively.&lt;br&gt;
We provide proofs of almost sure convergence for both algorithms under a general Markov noise framework as well as present the results of experiments where these algorithms are seen to show improved performance.
&lt;br&gt;
In the second work, we design hybrid control policies for hybrid systems whose mathematical models are unknown.&lt;br&gt;
Our contributions are threefold.&lt;br&gt;
First, we propose a framework for modelling the hybrid control design problem as a single Markov Decision Process (MDP).&lt;br&gt;
This result facilitates the application of off-the-shelf algorithms from Reinforcement Learning (RL) literature towards designing optimal control policies.&lt;br&gt;
Second, we model a set of benchmark examples of hybrid control design problem in the proposed MDP framework.&lt;br&gt;
Third, we adapt the recently proposed Proximal Policy Optimisation (PPO) algorithm for the hybrid action space and apply it to the above set of problems.&lt;br&gt;
It is observed that in each case the algorithm converges and finds the optimal policy.
&lt;br&gt;
Microsoft teams link:&lt;br&gt;
&lt;a href=&quot;https://teams.microsoft.com/l/meetup-join/19%3ameeting_NGQ2OTk0MWQtZGI5OC00OWQ1LWJmZDUtOTM4MzVhZDllNzVm%40thread.v2/0?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%229d60e185-2600-4b28-b2d5-2d47a54928f3%22%7d&quot;&gt;https://teams.microsoft.com/l/meetup-join/19%3ameeting_NGQ2OTk0MWQtZGI5OC00OWQ1LWJmZDUtOTM4MzVhZDllNzVm%40thread.v2/0?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%229d60e185-2600-4b28-b2d5-2d47a54928f3%22%7d&lt;/a&gt;
DTSTART:20210624T120000Z
END:VEVENT
END:VCALENDAR