SeminarsView all Seminars | Download ICal for this event
Algorithms for Stochastic Optimization, Statistical Estimation and Markov Decision Processes
Series: Ph.D (Engg.) Thesis Defence -ON-LINE
Speaker: Mr. Chandramouli Kamanchi Ph.D (Engg.) student Dept. of CSA
Date/Time: Oct 20 11:00:00
Location: Microsoft Teams - ON-LINE
Faculty Advisor: Prof. Shalabh Bhatnagar
Stochastic approximation deals with the problem of finding zeros of a function expressed as an expectation of a random variable. In this thesis we study and analyze convergence of stochastic approximation algorithms in the context of optimization under uncertainty, statistical estimation and Reinforcement Learning. Moreover, we also explore second order methods in the context of Markov Decision Processes.
First, we consider Stochastic Optimization (SO) problem where we optimize an objective function in the presence of noise. A prominent algorithm in SO namely Random Direction Kiefer-Wolfowitz (RDKW) solves the problem by obtaining noisy gradient estimate by randomly perturbing all the parameters simultaneously. In this thesis, we characterize the class of deterministic perturbation sequences that can be utilized in the RDKW algorithm. Using our characterization, we propose a construction of a deterministic perturbation sequence that has the least possible cycle length among all deterministic perturbations. We establish the convergence of the RDKW algorithm for the generalized class of deterministic perturbations.
In statistical estimation, one of the popular measures of central tendency that provides better representation and interesting insights of the data compared to the other measures like mean and median is the metric mode. In the second part of our thesis, we provide a computationally effective, on-line iterative algorithm that estimates the mode of a unimodal smooth density given only the samples generated from the density. Asymptotic convergence of the proposed algorithm using stochastic approximation techniques is provided. We also prove the stability of the mode estimates by utilizing the concept of regularization.
In the third part of our thesis, we propose Successive Over-Relaxation (SOR) Q-learning. In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function. We first derive a modified fixed point iteration for SOR Q-values and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the almost sure convergence of the SOR Q-learning to SOR Q-values. Finally, through numerical experiments, we demonstrate that SOR Q-learning is faster compared to the standard Q-learning algorithm in the literature.
In the fourth part of our thesis, we explore second-order methods in MDPs. We propose a second order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second order convergence. Through experiments, we show the effectiveness of our proposed approach.
Microsoft Teams Link: