Seminars

View all Seminars  |  Download ICal for this event

Stochastic Optimization And Its Application In Reinforcement Learning.

Series: M.Tech (Research) Colloquium

Speaker: Akash Mondal M.Tech (Research) student Dept. of CSA

Date/Time: Dec 12 12:00:00

Location: CSA Seminar Hall (Room No. 254, First Floor)

Faculty Advisor: Prof. Shalabh Bhatnagar

Abstract:
Numerous engineering fields, including transportation systems, manufacturing, communication networks, healthcare, and finance, frequently encounter problems requiring optimization, including uncertainty. Simulation-based optimization is a workable substitute for accurate analytical solutions because of the numerous input variables and the need for a system model. Smoothed functional (SF) algorithms belong to the class of simultaneous perturbation methods that have been found useful for stochastic optimization problems particularly in high-dimensional parameter spaces. SF methods update the gradient of the objective using function measurements involving parameters that are perturbed simultaneously along all component directions. Katkovnik and Kulchitsky originally developed the SF gradient procedure. This results in the objective function getting smoothed because of the convolution. The objective function smoothing that results from the convolution with a smoothing density function can help the algorithm converge to a global minimum or to a point close to it. First we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the $delta$ sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtain the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an $epsilon$-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that our algorithm outperforms GSF, SPSA and RDSA by a significant margin over a few non-convex settings and we further validate its performance over convex (noisy) objectives. Next we consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem, and are usually shown to converge to a stationary point of the value function. We propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms towards finding an $epsilon$-SOSP is $O(epsilon^{-3.5})$, and this is a significant improvement over the state-of-the-art sample complexity of $O(epsilon^{-4.5})$. Microsoft teams link: Join on your computer, mobile app or room device https://teams.microsoft.com/l/meetup-join/19%3ameeting_OGI4YjEzMmUtNGFkZi00YmE5LTgzMzQtYWVmZTI5OGQxNTE3%40thread.v2/0?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%22578cbfbe-b1e9-4f20-91bd-984b656368e5%22%7d Meeting ID: 488 618 948 051 Passcode: XQantJ

Speaker Bio:

Host Faculty: