Publications (All)

Books/Monographs

S.Bhatnagar, H.L.Prasad and L.A.Prashanth, Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods, Lecture Notes in Control and Information Sciences Series, Vol. 434, Springer, ISBN 978-1-4471-4284-3, Edition: 2013, 302 pages.
L.A.Prashanth and S.Bhatnagar, Gradient-based Algorithms for Zeroth Order Optimization, Frontiers and Trends in Optimization, NOW Publishers, 2025 prepublication draft

Book Chapters

A.G.Joseph and S.Bhatnagar, An Incremental Fast Policy Search Using a Single Sample Path, Shankar B., Ghosh K., Mandal D., Ray S., Zhang D., Pal S. (Eds) Pattern Recognition and Machine Intelligence, Lecture Notes in Computer Science, vol 10597. Springer, 2017 online pdf
S.Bhatnagar, V.S.Borkar and Prashanth L.A., Adaptive feature pursuit: Online adaptation of features in reinforcement learning, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (Ed. F. Lewis and D. Liu), IEEE Press Computational Intelligence Series (jointly published by IEEE Press and Wiley), Chapter 23, pp. 517-534, 2013 online pdf.
S.Bhatnagar, Simultaneous perturbation and finite difference methods, Wiley Encyclopedia of Operations Research and Management Science (Ed. J. Cochran), Vol. 7, pp. 4969-4991, Wiley, Hoboken, NJ, 2011 online pdf.
P.Viswanath, M.N.Murty and S.Bhatnagar, Pattern synthesis for non-parametric pattern recognition, Encyclopedia of Data Warehousing and Mining, second edition, Ed. J. Wang, Montclair State University, USA, Published by Idea group inc.,USA, 2008.
V.Sudha, L.Gopal, V.Sridhar and S.Bhatnagar, Fuzzy clustering based Ad recommendation for TV programs, Interactive TV: A Shared Experience, Eds. P.Cesar, K.Chorianopoulos and J.F.Jensen, Springer, pp.175-184, 2007.
P.Viswanath, M.N.Murty and S.Bhatnagar, Pattern synthesis for large-scale pattern recognition, Encyclopedia of Data Warehousing and Mining, Ed. . J. Wang, Montclair State University, USA, Published by Idea group inc.,USA, 2005, pp. 902-905.
S.Bhatnagar, M.Fu and S.I.Marcus, Two timescale SPSA algorithms for rate-based ABR flow control, Chapter 27, System Theory: Modeling, Analysis and Control, Ed. T.Djaferis and I.Schick, Kluwer Academic, Cambridge, Massachussets, pp.367-378, 1999.

Journal Papers

L.Mandal and S.Bhatnagar, Optimizing Successive Over-relaxation Q-learning with Deterministic Perturbation Gradient Search, IEEE Transactions on Artificial Intelligence, Aug 2025 online PDF
S.Bhatnagar and Deepak H.R., Variance Reduced Smoothed Functional REINFORCE Policy Gradient Algorithms, Transactions on Machine Learning Research (TMLR), Aug 2025 online PDF
S.Guin, V.S.Borkar, and S.Bhatnagar, An Actor-Critic Algorithm with Function Approximation for Risk Sensitive Cost Markov Decision Processes, IEEE Transactions on Automatic Control, July 2025 online PDF, arXiv
L.Mandal and S.Bhatnagar, n-Step Temporal Difference Learning with Optimal n, Vol. 179, Article 112449 (9 pages), Automatica, 2025 online PDF, arXiv
S.Pachal, S.Bhatnagar, and L.A.Prashanth, Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias, IEEE Transactions on Automatic Control, Vol.70, No.7, pp.4687-4702, 2025 online PDF, arXiv
L.Mandal, C.Lakshminarayanan and S.Bhatnagar, Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes, Systems and Control Letters, December 2024 (accepted) online PDF
L.Mandal, D.R.Bharadwaj and S.Bhatnagar, Variance-Reduced Deep Actor-Critic with an Optimally Sub-Sampled Actor Recursion, IEEE Transactions on Artificial Intelligence, Vol. 5, No. 7, pp. 3607-3623, 2024 online PDF
Vivek V.P and S.Bhatnagar, Efficient Energy Management in Smart Grids with Finite Horizon Q-Learning, Sustainable Energy, Grids and Networks, 38:101277, 2024 online PDF
A.Mondal, L.A.Prashanth, and S.Bhatnagar, Truncated Cauchy Random Perturbations for Smoothed Functional-based Stochastic Optimization, Automatica, 162:111528, 2024 arXiv, online PDF
A.Barat, K.J.Prabuchandran, and S.Bhatnagar, Energy Management in a Cooperative Energy Harvesting Wireless Sensor Network, IEEE Communication Letters, Vol. 28, No. 1, pp. 243-247, 2024 arXiv, online PDF
S.Bhatnagar, V.S.Borkar, and S.Guin, Actor-Critic or Critic-Actor? A Tale of Two Time Scales, IEEE Control Systems Letters, Vol.7, pp.2671-2676, 2023 online pdf arXiv
A.Ramaswamy and S.Bhatnagar, Analyzing approximate value iteration algorithms, Mathematics of Operations Research, Vol.47, No.3, pp. 2138-2159, 2022 online pdf arXiv
D.R.Bharadwaj, Chandramouli K., and S.Bhatnagar, A generalized minimax Q-learning algorithm for two-player zero-sum stochastic games, IEEE Transactions on Automatic Control, Vol. 67, No. 9, pp. 4816-4823, 2022 online pdf arXiv
Chandramouli K., D.R.Bharadwaj, and S.Bhatnagar, Generalized Second Order Value Iteration in Markov Decision Processes, IEEE Transactions on Automatic Control, Vol. 67, Issue 8, pp. 4241-4247, 2022 online pdf arXiv
P.Karmakar and S.Bhatnagar, Stochastic approximation with iterate-dependent Markov noise under verifiable conditions in compact state space with the stability of iterates not ensured, IEEE Transactions on Automatic Control, Vol.66, Issue 12, pp. 5941-5954, Dec 2021 online pdf arXiv
A.Ramaswamy, S.Bhatnagar and D.Quevedo, Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning, IEEE Transactions on Automatic Control, Vol. 66, Issue 9, pp. 3969-3983, Sep 2021 online pdf arXiv
K.J.Prabuchandran, S.Penubothula, Chandramouli K., and S.Bhatnagar, Novel first order Bayesian optimization with an application to reinforcement learning, Applied Intelligence, Springer, Vol. 51, pp. 1565-1579, 2021 online pdf
P.Karmakar and S.Bhatnagar, On tight bounds for function approximation error in risk-sensitive reinforcement learning, Systems and Control Letters, Vol. 150, 104899:1-7, April 2021 online pdf
A.Singla, Sindhu P.R., and S.Bhatnagar, Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge, IEEE Transactions on Intelligent Transportation Systems, Vol.22, No.1, pp.107-118, January 2021 online pdf arXiv
V.G.Yaji and S.Bhatnagar, Stochastic Recursive Inclusions in Two Timescales with Non-additive Iterate-dependent Markov Noise, Mathematics of Operations Research, Vol. 45, No.4, pp. 1405-1444, November 2020 online pdf arXiv
Sindhu P.R., Prabuchandran K.J., aqnd S.Bhatnagar, Reinforcement Learning Algorithm for Non-Stationary Environments, Applied Intelligence, Springer, Vol.50, pp.3590-3606, 2020 online pdf arXiv
I.John, Chandramouli K., and S.Bhatnagar, Generalized Speedy Q-learning, IEEE Control Systems Letters, Vol.4, Issue 3, July 2020 online pdf
Prashanth L.A., S.Bhatnagar, N.Bhavsar, M.Fu and S.Marcus, Random directions stochastic approximation with deterministic perturbations, IEEE Transactions on Automatic Control, Vol. 65, Issue 6, pp. 2450-2465, June 2020 online pdf arXiv
V.G.Yaji and S.Bhatnagar, Analysis of Stochastic Approximation Schemes with Set-valued Maps in the Absence of a Stability Guarantee and their Stabilization, IEEE Transactions on Automatic Control, Vol. 65, Issue 3, pp. 1100-1115, March 2020 online pdf arXiv
Chandramouli K., D.R.Bharadwaj and S.Bhatnagar, Successive Over-Relaxation Q-Learning, IEEE Control Systems Letters (L-CSS), Vol. 4, Issue 1, pp. 55-60, Jan 2020 online pdf arXiv
Chandramouli K., D.R.Bharadwaj, Prabuchandran K.J., and S.Bhatnagar, An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms, IEEE Control Systems Letters (L-CSS), Vol. 3, Issue 3, pp. 697-702, July 2019 online pdf arXiv
A.Ramaswamy and S.Bhatnagar, Stability of Stochastic Approximations with ‘Controlled Markov’ Noise and Temporal Difference Learning, IEEE Transactions on Automatic Control, Vol. 64, Issue 6, pp. 2614-2620, June 2019 online pdf
A.G.Joseph and S.Bhatnagar, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Machine Learning, Vol. 107, Issue 8–10, pp.1385–1429, 2018 online pdf arXiv
D.R.Bharadwaj, K.J.Prabuchandran, and S.Bhatnagar, Novel sensor scheduling scheme for intruder tracking in energy efficient sensor networks, IEEE Wireless Communication Letters, Vol. 7, Issue 5, pp. 712-715, Oct 2018 online pdf
A.G.Joseph and S.Bhatnagar, An incremental off-policy search in a model-free Markov decision process using a single sample path, Machine Learning, Vol.107, Issue 6, pp. 969–1011, 2018 online pdf arXiv
A.Ramaswamy and S.Bhatnagar, Analysis of Gradient Descent Methods with Non-Diminishing, Bounded Errors, IEEE Transactions on Automatic Control, Vol. 63, Issue 5, pp.1465–1471, 2018 online pdf arXiv
Chandrashekar L., S.Bhatnagar, and C.Szepesvari, A Linearly Relaxed Approximate Linear Program for Markov Decision Processes, IEEE Transactions on Automatic Control, Vol. 63, Issue 4, pp. 1185–1191, 2018 online pdf arXiv
S.Bhatnagar, S.Patel, and Karmeshu, A Stochastic Approximation Approach to Active Queue Management, Telecommunication Systems (Springer), Vol.68, No.1, pp.89–104, 2018 online pdf
V.G.Yaji and S.Bhatnagar, Stochastic Recursive Inclusions with Non-Additive Iterate-Dependent Markov Noise, Stochastics, Vol. 90, No. 3, pp. 330–363, 2018 online pdf arXiv
E.Zhou and S.Bhatnagar, Gradient-based Adaptive Stochastic Search for Simulation Optimization over Continuous Space, INFORMS Journal on Computing, Vol. 30, No. 1, pp. 154–167, 2018 online pdf
P.Karmakar and S.Bhatnagar, Two Time-scale Stochastic Approximation with Controlled Markov noise and Off-policy Temporal Difference Learning, Mathematics of Operations Research, Vol. 43, No.1, pp. 130–151, 2018 online pdf arXiv
Chandrashekar L. and S.Bhatnagar, A Stability Criterion for Two Timescale Stochastic Approximation Schemes, Automatica, Vol.79, pp.108-114, May 2017 online pdf
L.A.Prashanth, S.Bhatnagar, M.Fu, and S.Marcus, Adaptive system optimization using random directions stochastic approximation, IEEE Transactions on Automatic Control, Vol. 62, Issue 5, pp.2223–2238, 2017 online pdf arXiv
A.Ramaswamy and S.Bhatnagar, A generalization of the Borkar-Meyn theorem for stochastic recursive inclusions, Mathematics of Operations Research, Vol. 42, No. 3, pp. 648–661, 2017 online pdf arXiv
A.Ramaswamy and S.Bhatnagar, Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem, Stochastics, Vol.88, No.8, pp.1173-1187, 2016 online pdf,arXiv
Lakshmanan K. and S.Bhatnagar, Quasi-Newton smoothed functional al gorithms for unconstrained and constrained simulation optimization, Computational Optimization and Applications (Springer), Vol.66, No.3, pp.533-556, 2017 online pdf
Karmeshu, S.Patel, and S.Bhatnagar, Adaptive mean queue size and its rate of change: queue management with random dropping, Telecommunication Systems (Springer), Vol.65, Issue 2, pp.281-295, 2017 online pdf
L.A.Prashanth, H.L.Prasad, S.Bhatnagar and P.Chandra, A constrained optimization perspective on actor critic algorithms and application to network routing, Systems and Control Letters, Vol.92, pp.46-51, 2016 online pdf
Prabuchandran K.J., S.Bhatnagar and V.S.Borkar, Actor Critic Algorithms with Online Feature Adaptation, ACM Transactions on Modeling and Computer Simulation, Vol.26, No.4, pp.24:1-24:26, 2016 online pdf
M.S.Abdulla and S.Bhatnagar, Multi-armed bandits based on a variant of simulated annealing, Indian Journal of Pure and Applied Mathematics (Springer), Special Issue in Honour of Prof.Vivek Borkar's 60th Birthday, Vol.47, Issue 2, pp.195-212, 2016 online pdf
S.Bhatnagar and Lakshmanan K., Multiscale Q-learning with Linear Function Approximation, Discrete Event Dynamic Systems, Vol.26, Issue 3, pp.477-509, 2016 online pdf
V.G.Yaji and S.Bhatnagar, Necessary and sufficient conditions for optimality in constrained general sum stochastic games, Systems and Control Letters, Vol. 85, pp.8-15, 2015 online pdf
Sindhu P.R., Prabuchandran K.J., and S.Bhatnagar, Energy sharing for multiple sensor nodes with finite buffers, IEEE Transactions on Communications, Vol.63, No.5, pp.1811-1823, 2015 online pdf
S.Bhatnagar and Prashanth L.A., Simultaneous Perturbation Newton Algorithms for Simulation Optimization, Journal of Optimization Theory and Applications (Springer), Vol. 164, Issue 2, pp.621-643, 2015 online pdf
Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta, Simultaneous perturbation methods for adaptive labor staffing in service systems, Simulation, Vol. 91, No. 5, pp.432-455, 2015 online pdf
Prashanth L.A, A.Chatterjee and S.Bhatnagar, Two timescale convergent Q-learning for sleep–scheduling in wireless sensor networks, Wireless Networks (Springer), Vol. 20, Issue 8, pp.2589-2604, 2014 online pdf
D.Ghoshdastidar, A.Dukkipati and S.Bhatnagar, Newton based stochastic optimization using q-Gaussian smoothed functional algorithms, Automatica (Elsevier), Vol. 50, No.10, pp.2606-2614, 2014 online pdf
D.Ghoshdastidar, A.Dukkipati and S.Bhatnagar, Smoothed functional algorithms for stochastic optimization using q-Gaussian distributions, ACM Transactions on Modeling and Computer Simulation, Vol. 24, No. 3, pp.17:1–17:26, 2014 online pdf
S. Chakravarty, Sindhu P.R. and S. Bhatnagar, A simulation based algorithm for optimal pricing policy under demand uncertainty, International Transactions in Operational Research (Wiley), Vol.21, Issue 5, pp.737-760, 2014 online pdf
S.Bhatnagar, Smoothed functional algorithms for optimization, Annals of the Indian National Academy of Engineering (INAE), Vol.XI, pp.95-105, April 2014.
S.Bhatnagar, V.S.Borkar and Prabuchandran K.J., Feature search in the Grassmanian in online reinforcement learning, IEEE Journal of Selected Topics in Signal Processing, Vol.7, No.5, pp.746-758, 2013 online pdf
Prabuchandran K.J., S.K.Meena and S.Bhatnagar, Q-learning based energy management policies for a single sensor node with finite buffer, IEEE Wireless Communication Letters, Vol.2, Issue 1, pp.82-85, 2013 online pdf.
H.L.Prasad, L.A.Prashanth, S.Bhatnagar and N.Desai, Adaptive Smoothed Functional Algorithms for Optimal Staffing Levels in Service Systems, Service Science (INFORMS), Vol. 5, Issue 1, pp.29-55, March 2013 online pdf.
L.A.Prashanth and S.Bhatnagar, Threshold tuning using stochastic optimization for graded signal control, IEEE Transactions on Vehicular Technology, Vol. 61, No. 9, pp.3865-3880, November 2012 online pdf.
H.L.Prasad and S.Bhatnagar, General-Sum Stochastic Games: Verifiability Conditions for Nash Equilibria, Automatica, Vol. 48, Issue 11, pp.2923-2930, 2012 online pdf.
K.R.Vemu, S.Bhatnagar and N.Hemachandra, Optimal Multi-layered Congestion Based Pricing Schemes for Enhanced QoS, Computer Networks (Elsevier), Vol.56, Issue 4, pp.1249-1262, March 2012. (DOI: 10.1016/j.comnet.2011.12.004)
S.Bhatnagar and Lakshmanan K., An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes, Journal of Optimization Theory and Applications (Springer), Vol. 153, No. 3, pp.688-708, 2012. (DOI: 10.1007/s10957-012-9989-5)
S.Bhatnagar, V.Mishra and N.Hemachandra, Stochastic algorithms for discrete parameter simulation optimization, IEEE Transactions on Automation Science and Engineering, Vol. 8, Issue 4, pp. 780-793, 2011. (DOI: 10.1109/TASE.2011.2159375)
Karmeshu, S.Bhatnagar and V.Mishra, An optimized SDE model for slotted Aloha, IEEE Transactions on Communications, Vol. 59, No. 6, pp.1502-1508, 2011. (DOI: 10.1109/TCOMM.2011.09.090113)
L.A.Prashanth and S.Bhatnagar, Reinforcement learning with function approximation for traffic signal control, IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 2, pp.412-421, 2011. (DOI: 10.1109/TITS.2010.2091408)
S.Bhatnagar, The Borkar-Meyn Theorem for Asynchronous Stochastic Approximations, Systems and Control Letters, Vol. 60, pp. 472-478, 2011. (DOI: 10.1016/j.sysconle.2011.04.002)
S.Bhatnagar and Karmeshu, Monte-Carlo Estimation of Time-Dependent Statistical Characteristics of Random Dynamical Systems, Applied Mathematical Modelling (Elsevier), Vol.35, pp.3063-3079, 2011. (DOI: 10.1016/ j.apm.2010.12.024).
S.Bhatnagar, N.Hemachandra and V.Mishra, Stochastic approximation algorithms for constrained optimization via simulation, ACM Transactions on Modeling and Computer Simulation, Vol. 21, Issue 3, pp:15:1-15:22, 2011.
S.Bhatnagar, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, Systems and Control Letters, Vol. 59, pp.760-766, 2010 (DOI: 10.1016/ j.sysconle.2010.08.013).
G.R.Reddy, S.Bhatnagar, V.Rakesh and V.P.Chaturvedi, An efficient algorithmfor scheduling in bluetooth piconets and scatternets, Wireless Networks (Springer), Vol.16, No.7, pp.1799-1816, 2010 (DOI: 10.1007/ s11276-009-0229-3).
A.Chakraborty and S.Bhatnagar, Optimized policies for the retransmission probabilities in slotted Aloha, Simulation, Vol.86, No.4, pp.247-261, 2010.
S.Bhatnagar and R.K.Patro, A proof of convergence of the B-RED and P-RED algorithms for random early detection, IEEE Communication Letters, Vol.13, No.10, pp.809-811, 2009.
S.Bhatnagar, R.S.Sutton, M.Ghavamzadeh and M.Lee, Natural Actor-Critic Algorithms, Automatica, Vol.45, Issue 11, pp.2471-2482, 2009.
S.Bhatnagar, Karmeshu and V.Mishra, Optimal Parameter Trajectory Estimation in Parameterized SDEs: An Algorithmic Procedure, ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol. 19, No. 2, pp. 8:1-8:27, 2009.
R.K.Patro and S.Bhatnagar, A Probabilistic Constrained Nonlinear Optimization Framework to Optimize RED parameters, Performance Evaluation, Vol. 66, Issue 2, pp.81-104, 2009.
S.Bhatnagar and M.S.Abdulla, Simulation-based optimization algorithms for finite horizon Markov decision processes, Simulation, Vol. 84, No. 12, pp. 577-600, 2008.
V.Sudha, L.Gopal, S.Bhatnagar and V.Sridhar, A novel Ad-recommendation system for TV programs Springer/ACM Multimedia Systems Journal, Vol.14, No.2, pp.73-87, 2008.
C. Vignat and S. Bhatnagar, An extension of Wick's Theorem, Statistics and Probability Letters, Vol.78, Issue 15, pp.2404-2407, 2008.
S.Bhatnagar and K.M.Babu, New algorithms of the Q-learning type, Automatica, Vol.44, Issue 4, pp.1111-1119, 2008.
S.Bhatnagar, An adaptive multivariate three-timescale smoothed functional algorithm for simulation optimization, ACM Transactions on Modeling and Computer Simulation, Vol.18, No.1, pp.2:1-2:35, December 2007.
A.Dukkipati, S.Bhatnagar and M.N.Murty, Gelfand-Yaglom-Perez theorem for generalized relative entropy functionals, Information Sciences, Vol.177, pp.5707-5714, 2007.
A.Dukkipati, S.Bhatnagar and M.N.Murty, On measure-theoretic aspects of nonextensive entropy functionals and corresponding maximum entropy prescriptions, Physica A, Vol.384, pp.758-774, 2007.
M.S.Abdulla and S.Bhatnagar, ‘Reinforcement learning based algorithms for average cost Markov decision processes’, Discrete Event Dynamic Systems, Vol.17, No.1, pp.23-52, 2007.
S.Bhatnagar, V.S.Borkar and A.Madhukar, A simulation based algorithm for ergodic control of Markov chains conditioned on rare events, Journal of Machine Learning Research, Vol.7, pp.1937-1962, 2006.
R.Vaidya and S.Bhatnagar, Robust optimization of random early detection, Telecommunication Systems, Vol.33, No.4, pp.291-316, 2006.
P.Viswanath, M.N.Murty and S.Bhatnagar, Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification, Pattern Recognition Letters, 27, pp.1714-1724, 2006.
S.Bhatnagar and J.R.Panigrahi, Actor-critic algorithms for hierarchical Markov decision processes, Automatica, Vol.42, Issue 4, pp.637-644, 2006.
A.Dukkipati, M.N.Murty and S.Bhatnagar, Nonextensive triangle equality and other properties of Tsallis relative-entropy minimization, Physica A, Vol.361, pp.124-138, 2006.
S.Bhatnagar and H.J.Kowshik, A discrete parameter stochastic approximation algorithm for simulation optimization, Simulation: Transactions of the Society for Modeling and Simulation International, Vol.81, No.11, pp.757-772, 2005.
P.Viswanath, M.N.Murty and S.Bhatnagar, Overlap pattern synthesis with an efficient nearest neighbour classifier, Pattern Recognition, Vol.38, pp.1187-1195, 2005.
S.Bhatnagar and I.B.B.Reddy, Optimal threshold policies for admission control in communication networks via discrete parameter stochastic approximation, Telecommunication Systems, Vol.29, No.1, pp.9-31, 2005.
S.Bhatnagar, Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization, ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol.15, No.1, pp.74-107, January 2005.
P.Viswanath, M.N.Murty and S.Bhatnagar, Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification, Information Fusion, Vol. 5, pp.239-250, 2004.
S.Bhatnagar and S.Kumar, A simultaneous perturbation stochastic approximation based actor-critic algorithm for Markov decision processes, IEEE Transactions on Automatic Control, Vol. 49, Number 4, pp.592-598, April 2004.
S.Bhatnagar and V.S.Borkar, Multiscale chaotic SPSA and smoothed functional algorithms for simulation optimization, Simulation: Transactions of the Society for Modeling and Simulation International, Vol. 79, Issue 10, pp.568-580, 2003.
S.Bhatnagar, M.C.Fu, S.I.Marcus and I-J.Wang, Two timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences, ACM Transactions on Modelling and Computer Simulation, Vol. 13, No. 2, pp.180-209, 2003.
X.-R.Cao, R.Zhiyuan, S.Bhatnagar, M.C.Fu and S.I.Marcus, A time aggregation approach to Markov decision processes, Automatica, Vol. 38, No. 6, 929-943, 2002.
S.Bhatnagar, M.C.Fu, S.I.Marcus and P.J.Fard, Optimal structured feedback policies for ABR flow control using two timescale SPSA, IEEE/ACM Transactions on Networking, Vol.9, No.4, pp.479-491, 2001.
S.Bhatnagar, M.C.Fu, S.I.Marcus and S.Bhatnagar, Two timescale algorithms for simulation optimization of hidden Markov models, IIE Transactions (Pritsker special issue on simulation), Vol.3, pp.245-258, 2001.
S.Bhatnagar and V.S.Borkar, A two time scale stochastic approximation scheme for simulation based parametric optimization, Probability in the Engineering and Informational Sciences, Vol.12, pp.519-531, 1998.
V.H.Gupta and S.Bhatnagar, An optimal fuel-injection policy for performance enhancement in internal combustion (I.C.) engines, Sadhana (Indian Academy of Sciences), Vol.22, Part 4, pp.545-552, 1997.
S.Bhatnagar and V.S.Borkar, Multiscale stochastic approximation for parametric optimization of hidden Markov models, Probability in the Engineering and Informational Sciences, Vol.11, pp.509-522, 1997.
S.Bhatnagar and V.S.Borkar, A convex analytic framework for ergodic control of semi-Markov processes, Mathematics of Operations Research, Vol.20, No.4, pp.923-936, 1995.

Preprints Submitted to journals

Our recent papers on arXiv can be found here

Proceedings of International Conferences

J.Joseph, B.Amrutur, and S.Bhatnagar, Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting, SIGGRAPH Asia, Hong Kong, Dec. 15-18, 2025 early arXiv
P.Dutta, M.Ayyoob, S.Bhatnagar, and A.Dukkipati, One Encoder to Rule them All: Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators, International Conference on Computer Vision (ICCV), Honolulu, Hawaii, Oct.19-23, 2025
P.Panda and S.Bhatnagar, Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation, AAAI, Philadelphia, USA, Feb 27-March 4, 2025 (accepted) arXiv
S.Salmalge and S.Bhatnagar, Reinforcement Learning Algorithms with Graph Convolution Networks for Traffic Signal Control, EAI Intelligent Systems T ransport Conference, University of Pisa, Pisa, Italy, December 4-6, 2024.
A.Srivastava, S.Bhatnagar, M.N.Murty and A.Raman J., Learning dynamic representations in large language modela for evolving data streams, In ternational Conference on Pattern Recognition (ICPR), Kolkata, December 2024.
P.Panda and S.Bhatnagar, Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms, Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, July 17-19, 2024 arXiv
M.Maniyar, Prashanth L.A., A.Mondal and S.Bhatnagar, A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning, 27th International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain, May 2-4, 2024 (accepted) arXiv
Vivek V.P, D.R.Bharadwaj and S.Bhatnagar, Dynamic Energy Management in Competing Microgrids using Reinforcement Learning, Conference on Innovative Smart Grid Technologies, North America (ISGT NA 2024), Washington D.C., Feb 19-24, 2024 online PDF
S.Bhatnagar, The Reinforce Policy Gradient Algorithm Revisited, Indian Control Conference (IEEE), Hyderabad, Dec 18-20, 2023 (accepted) arXiv
S.Guin and S.Bhatnagar, A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes, IEEE Conference on Decision and Control, Singapore, December 13-15, 2023 (to appear) arXiv
S.Bhatnagar, V.S.Borkar, and S.Guin, Actor-Critic or Critic-Actor? A Tale of Two Time Scales, IEEE Conference on Decision and Control (Proceedings in IEEE Control Systems Letters), Singapore, December 13-15, 2023 arXiv
H.Karumanchi, D.R.Bharadwaj, K.J.Prabuchandran, and S.Bhatnagar, Autonomous UAV navigation in complex environments using human feedback, 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2023), Busan, South Korea, Aug 28-31, 2023
N.Saxena, S.Khasthagir, Shishir N.Y., and S.Bhatnagar, Off-Policy Average Reward Actor-Critic with Deterministic Policy Search, ICML 2023, Honolulu, Hawaii, July 23-29, 2023 (accepted) arXiv
S.Bhatnagar and L.A.Prashanth, Generalized simultaneous perturbation stochastic approximation with reduced estimator bias, 57th Annual Conference on Information Sciences and Systems (CISS), Invited Paper, Baltimore, Maryland, March 22-24, 2023
A.K.Jayant and S.Bhatnagar, Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm, NeurIPS 2022, New Orleans, Louisiana, USA, Nov 28 to Dec 04, 2022 arXiv
R.Deb, M.Gandhi, and S.Bhatnagar, Schedule Based Temporal Difference Algorithms, 58th Annual Allerton Conference on Communication, Control, and Computing (IEEE), Monticello, Illinois, USA, Sep 27 to 30, 2022 Online PDF (Invited Paper)
Sindhu P.R., Prabuchandran K.J., S. Ganguly, and S.Bhatnagar, Data Efficient Safe Reinforcement Learning, IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, October 9-12, 2022
D.R.Bharadwaj, P.Jain, Prabuchandran K.J., and S.Bhatnagar, Neural network compatible off-policy natural actor-critic algorithm, Int ernational Joint Conference on Neural Networks (IJCNN), Padova, Italy, July 18-23, 2022 arXiv (Best Student Paper Award)
U.A.Mishra, S.R.Samineni, P.Goel, C.Kunjeti, H.Lodha, A.Singh, A.Sagi, S.Bhatnagar and S.Kolathaya, Dtnamic mirror descent based model predictive control for accelerating robot learning, IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, May 23-27, 2022 arXiv
R.Deb and S.Bhatnagar, Gradient Temporal Difference with Momentum: Stability and Convergence, AAAI Conference on Artificial Intelligence, Vancouver, Feb 22 - Mar 01, 2022 arXiv
Priya S. and S.Bhatnagar, Robust traffic signal timing control using multiagent twin delayed deep deterministic policy gradients, 14th International Conference on Agents and Artificial Intelligence (ICAART), Online, Feb 3-5, 2022
P.Parnika, D.R.Bharadwaj, D.S.K.Reddy and S.Bhatnagar, Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning, AAMAS (Extended Abstract), Virtual Event, May 3-7, 2021
K.Paigwar, L.Krishna, S.Tirumala, N.Khetanm A.Sagi, A.Joglekar, S.Bhatnagar, A.Ghosal, B.Amrutur, and S.Kolathaya, Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach, Conference on Robot Learning (CoRL), Virtual Event, November 16-18, 2020
S.Tirumala, S.G.Venkatesh, K.Paigwar, A.V.Sagi, A.Joglekar, S.Bhatnagar, B.Amrutur and S.N.Y.Kolathaya, Learning Stable Manoevtres for Quadruped Robots from Expert Demonstrations, 29th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man), Naples, Italy, Aug.31-Sep.04, 2020
S.Nayak, C.A.Ekbote, A.P.S.Chauhan, D.R.Bharadwaj, P.Ray, A.Sikdar, D.S.K.Reddy, and S.Bhatnagar, Stochastic Game Framework for Efficient Energy Management in Microgrid Networks, IEEE PES Innovative Smart Grid Technologies Conference, The Hague, Netherlands, Oct. 25-28, 2020 arXiv
Sindhu P.R., S. Rao, and S.Bhatnagar, Learning-Based Resource Allocation in Industrial IoT Systems, IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, London, UK, Aug. 31 - Sep. 3, 2020
I.John and S.Bhatnagar, Deep Reinforcement Learning with Successive Over-Relaxation and its Application in Auto-scaling Cloud Resources, International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, July 19-24, 2020
D.R.Bharadwaj, Chandramouli K., and S.Bhatnagar, A convergent off-policy temporal difference algorithm, European Conference on Artificial Intelligence (ECAI), Santiago de Compostela, Spain, June 8-12, 2020
I.John, R.Karumanchi and S.Bhatnagar, Predictive and prescriptive analytics for performance optimization: framework and a case study on a large-scale enterprise system, IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, Florida, Dec 16-19, 2019
A.Dharmavaram, M.Riemer and S.Bhatnagar, Hierarchical average reward policy gradient algorithms, AAAI-20 Student Abstract and Poster Program, (to appear in Proceedings of AAAI 2020), New York, Feb 7-12, 2020
A.G.Joseph and S.Bhatnagar, An incremental algorithm for estimating extreme quantiles, Indian Control Conference (IEEE), pp. 286-291, Hysderabad, Dec. 18-20, 2019 online pdf
A.G.Joseph and S.Bhatnagar, An Adaptive and Incremental Approach to Quantile Estimation, IEEE Conference on Decision and Control, Nice, France, Dec 11-13, 2019 (to appear in Proceedings of CDC - to be published in IEEE XPlore)
Chandramouli K., D.R.Bharadwaj, Prabuchandran K.J., and S.Bhatnagar, An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms, IEEE Conference on Decision and Control, Nice, France, Dec 11-13, 2019 (to appear in IEEE Control Systems Letters) online pdf arXiv
S.Kolathaya, D.Dholakiya, S.Bhatnagar, A.Singla, S.Bhattacharya, A.Ghosal, B.Amrutur, A.Singh, A.Joglekar, A.V.Sagi, S.Shetty, and A.Gunalan, Trajectory Based Deep Policy Search for Quadrupedal Walking, 28th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man 2019), New Delhi, Oct 14-18, 2019
S.Bhattacharya, A.Singla, A.Singh, D.Dholakiya, S.Bhatnagar, B.Amrutur, A.Ghosal, and S.Kolathaya, Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots, 28th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man 2019), New Delhi, Oct 14-18, 2019 arXiv
A.G.Joseph and S.Bhatnagar, Stochastic approximation trackers for model-based search, 57th Annual Allerton Conference on Communication, Control and Computing, Urbana, Illinois, Sep 24-27, 2019
A.Singla, S.Bhattacharya, D.Dholakiya, S.Bhatnagar, A.Ghosal, B.Amrutur and S.Kolathaya, Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives, IEEE International Conference on Robotics and Automation (ICRA), Montreal, 2019 (Accepted) arXiv
D.R.Bharadwaj, D.S.K.Reddy, K.J.Prabuchandran and S.Bhatnagar, Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning, 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, pp.1931-1933, 2019 online pdf
D.Dholakia, S.Bhattacharya, A.Gunalan, A.Singla, S.Bhatnagar, B.Amrutur, S.Kolathaya and A.Ghosal, Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch, 5th International Conference on Control, Automation and Robotics (ICCAR), Beijing, April 19-22, 2019 (Accepted) arXiv
Indu John and S.Bhatnagar, Efficient budget allocation and task assignment in crowdsourcing, CoDS-COMAD, Kolkata, pp.318-321, Kolkata, Jan 3-5, 2019 (Special Mention Award in the Young Researchers’ Symposium to Indu John) online pdf
A.G.Joseph and S.Bhatnagar, An Adaptive Sampling Algorithm for Policy Evaluation, Fifth IEEE Indian Control Conference (ICC), IIT Delhi, pp.2-9, Jan 9-11, 2019 online pdf
N.Karanjkar, M.Desai and S.Bhatnagar, A simulation-based technique for continuous-space embedding of discrete-parameter queueing systems, European Simulation and Modeling Conference, Ghent, Belgium, Oct 24-26, 2018 (accepted)
D.R.Bharadwaj, D.S.K.Reddy, K.Narayanam and S.Bhatnagar, A unified decision making framework for supply and demand management in microgrid networks, IEEE SmartGridComm, Aalborg, Denmark, Oct 29-Nov 1, 2018 online pdf
Chandramouli K., Prabuchandran K.J., D.S.K.Reddy and S.Bhatnagar, Generalized Deterministic Perturbations For Stochastic Gradient Search, IEEE Conference on Decision and Control, pp. 5734-5739, Fontainebleau, Miami Beach, FL, USA, December 17-19, 2018, online pdf
S.Kumar, Sindhu P.R., Chandrashekar L., P.Parihar, K.Gopinath and S.Bhatnagar, Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach, IEEE Cloud, Honolulu, Hawaii, June 25-30, 2017
A.G.Joseph and S.Bhatnagar, A Model based Search Method for Prediction in Model-free Markov Decision Process, Proceedings of International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska, May 14-19, 2017
A.G.Joseph and S.Bhatnagar, Bounds for Off-policy Prediction in Reinforcement Learning , Proceedings of International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska, May 14-19, 2017
D.Saikoti Reddy, L.A.Prashanth, and S.Bhatnagar, Improved Hessian estimation for adaptive random directions stochastic approximation, Proceedings of IEEE Conference on Decision and Control (CDC), Las Vegas, NV, Dec 12-14, 2016
A.G.Joseph and S.Bhatnagar, Revisiting the Cross Entropy Method with Applications in Stochastic Global Optimization and RL (Full Paper), Proceedings of European Conference on Artificial Intelligence (ECAI), The Hague, Netherlands, Aug.29-Sep.02, 2016
R.K.Maity, Chandrashekar L., Sindhu P.R., and S.Bhatnagar, Shaping Proto-Value Functions using Rewards (Short Paper), Proceedings of European Conference on Artificial Intelligence (ECAI), The Hague, Netherlands, Aug.29-Sep.02, 2016
A.G.Joseph and S.Bhatnagar, A Randomized Algorithm for Continuous Optimization, Proceedings of Winter Simulation Conference (WSC), Arlington, Virginia, USA, Dec. 11-14, 2016
B.N.Ranganath and S.Bhatnagar, Scalable Focussed Entity Resolution, Proceedings of the International Joint Conference on Neural Networks (IJCNN), IEEE Press, Vancouver, Canada, July 25-29, 2016
A.G.Joseph and S.Bhatnagar, A stochastic approximation algorithm for t he problem of quantile estimation, Proceedings of 22nd International Conference on Neural Information Processing (ICONIP), Istanbul, Turkey, Nov.9-12, 2015 (to appear)
Prasad H.L., Prashanth L.A., and S.Bhatnagar, Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games, Proceedings of 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Istanbul, Turkey, pp.1371-1379, May 4-8, 2015
Chandrashekar L. and S.Bhatnagar, A Generalized Reduced Linear Program for Markov Decision Processes, Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), Austin, Texas, USA, Jan 25-30, 2015 (to appear)
M.S.Abdulla and S.Bhatnagar, A Transitions-only algorithm for Compact Action Set Markov Decision Processes, Proceedings of Indian Control Conference (ICC), IEEE, Chennai, Jan 5-7, 2015 (to appear)
M.S.Abdulla and S.Bhatnagar, Stochastic multi-armed bandit algorithms based on simulated annealing, Proceedings of Indian Control Conference (ICC), IEEE, Chennai, Jan 5-7, 2015 (to appear)
H.Yao, C.Szepesvari, R.Sutton, J.Modayil and S.Bhatnagar, Universal option models, Advances in Neural Information processing Systems (NIPS), pp.990-998, Dec. 8-11, 2014, Montreal, Canada
Prabuchandran, K.J., Hemanth Kumar A.N. and S.Bhatnagar, Multi-agent reinforcment learning for traffic signal control, Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems, pp.2529-2534, Qingdao, China, Oct. 9-11, 2014
Chandrashekar L., A.Dubey, S.Bhatnagar and C.Balamurugan, A Markov decision process framework for predictable job completion times on crowdsourcing platforms, Proceedings of HCOMP, pp.34-35, Pittsburgh, Nov. 2-4, 2014
Chandrashekar L. and S.Bhatnagar, Max-plus methods for optimal control and zero-sum games, Proceedings of the IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 15-17, 2014 (to appear)
Prabuchandran K.J., S.Bhatnagar and V.S.Borkar, An actor-critic algorithm based on Grassmanian search, Proceedings of the IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 15-17, 2014 (to appear)
E.Zhou, S.Bhatnagar and X.Chen, Simulation optimization via gradient-based stochastic search, Proceedings of the Winter Simulation Conference, pp.3869-3879, Savannah, GA, Dec. 7-10, 2014
Prashanth L.A., A. Chatterjee and S.Bhatnagar, Adaptive sleep-wake control using reinforcement learning in sensor networks, Proceedings of International Conference on Communication Systems and Networks (COMSNETS), IEEE, pp.1-8, Jan 6-10, 2014, Bangalore online pdf.
Prashanth L.A., Prasad H.L., N.Desai, S.Bhatnagar, Mechanisms for Hostile Agents with Capacity Constraints, Proceedings of Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS2013), Ito, Jonker, Gini, and Shehory (eds.), pp. 659-666, Saint Paul, Minnesota, May 6-10, ISBN: 978-1-4503-1993-5, 2013 online pdf.
K.Laskshmanan and S.Bhatnagar, A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes, Proceedings of the Fiftieth Annual Allerton Conference on Communication, Control and Computing (IEEE Press), UIUC, Illinois, pp.400-405, ISBN: 978-1-4673-4537-8, 2012.
D.Ghoshdastidar, A.Dukkipati and S.Bhatnagar, q-Gaussian based smoothed functional algorithms for stochastic optimization, Proceedings of IEEE International Symposium on Information Theory (ISIT’2012), pp. 1059-1063, E-ISBN: 978-1-4673-2578-3, July 1-6, 2012.
Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta, Stochastic optimization for adaptive labor staffing in service systems, Proceedings of 9th International Conference on Service Oriented Computing (ICSOC), Cyprus, Dec 5-8, 2011, Published in Service Oriented Computing, LNCS, Vol. 7084, pp.487-494, 2011.
Prashanth L.A. and S.Bhatnagar, Reinforcement Learning with Average Cost for Adaptive Control of Traffic Lights at Intersections, Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, DC, pp. 1640-1645 (ISBN: 978-1-4577-2198-4), October 5-7, 2011.
K.Lakshmanan and S.Bhatnagar, Smoothed functional and Quasi-Newton algorithms for routing in multi-stage queueing network with constraints, Proceedings of ICDCIT (Distributed Computing and Internet Technology, Lecture Notes in Computer Science, Vol. 65362011, pp.175-186, DOI: 10.1007978-3-642-19056-8_12), Feb.9-12, 2011, Bhubaneswar, India.
H.R.Maei, C.Szepesvari, S.Bhatnagar and R.S.Sutton, Toward Off-Policy Learning Control with Function Approximation, Proceedings of ICML, 2010.
L.A.Prashanth and S.Bhatnagar, Control of traffic lights at junctions using reinforcement learning, Proceedings of Workshop on Computer Aided Transportation Planning and Traffic Engineering, pp.129-138, Dec.7-11, 2009, Bangalore, 2009.
H.Yao, R.S.Sutton, S.Bhatnagar and C.Szepesvari, Multi-Step Dyna Planning for Policy Evaluation and Control, Proceedings of NIPS, 2009.
H.R.Maei, C.Szepesvari, S.Bhatnagar, D.Precup and R.S.Sutton, Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation, Proceedings of NIPS , 2009.
H.Yao, S.Bhatnagar and C.Szepesvari, LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS, Proceedings of IEEE Conference on Decision and Control, Shanghai, 2009.
H.Yao, R.Sutton, S.Bhatnagar, D.Diao and C.Szepesvari, Dyna(k): A multi-step Dyna planning, Proceedings of the ICMLUAICOLT Workshop on Abstraction in Reinforcement Learning, Montreal, 2009.
H.Yao, S.Bhatnagar and C.Szepesvari, Temporal difference learning by direct preconditioning’, Multidisciplinary Symposium on Reinforcement Learning (MSRL), Montreal, 2009
R.S.Sutton, H.R.Maei, D.Precup, S.Bhatnagar, D.Silver, C.Szepesvari and E.Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the International Conference on Machine Learning (ICML), Montreal, 2009.
G.R.Reddy and S.Bhatnagar, An efficient and optimized bluetooth scheduling algorithm for scatternets, Proceedings of IEEE Advanced Networks and Telecommunication Systems (ANTS) Conference, Mumbai, 2008.
S.R.Kolavali and S.Bhatnagar, Ant colony optimization algorithms for shortest path problems, Proceedings of Second Workshop on Network Control and Optimization (NET-COOP) (Published in NET-COOP 2008, Eds. E. Altman and A. Chaintreau, LNCS 5425, pp.37-44, Springer, 2008), September 8-10, 2008, Paris, France.
V.Sudha, S.Bhatnagar, S.V.Basavaraja and V.Sridhar, SPSA based feature relevance estimation for video retrieval, Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP), Queensland, Australia, 2008.
R.Patro and S.Bhatnagar, An optimal RIO with statistical delay assurances, Proceedings of National Conference on Communications (NCC), Mumbai, February 2-3, 2008.
V.P.Chaturvedi, V.Rakesh and S.Bhatnagar, An efficient and optimized bluetooth scheduling algorithm for piconets, Proceedings of International Conference on Distributed Computing and Internet Technology (Published in Distributed Computing and Internet Technology, Eds.T.Janowski and H.Mohanty, LNCS 4882, pp.135-145, Springer, 2007), December 17-20, 2007, Bangalore, India.
K.R.Vemu, S.Bhatnagar and N.Hemachandra, An optimal weighted-average congestion based pricing scheme for enhanced QoS, Proceedings of International Conference on Distributed Computing and Internet Technology (Published in Distributed Computing and Internet Technology, Eds.T.Janowski and H.Mohanty, LNCS 4882, pp.135-145, Springer, 2007), December 17-20, 2007, Bangalore, India.
M.S.Abdulla and S.Bhatnagar, Network flow-control using asynchronous stochastic approximation, Proceedings of IEEE Conference on Decision and Control, New Orleans, USA, December 12-14, 2007.
K.R.Vemu, S.Bhatnagar and N.Hemachandra, Link route pricing for enhanced QoS, Proceedings of IEEE Conference on Decision and Control, New Orleans, USA, December 12-14, 2007.
V.Mishra, S.Bhatnagar and N.Hemachandra, Discrete Parameter Simulation Optimization Algorithms with Applications to Admission Control with Dependent Service Times, Proceedings of IEEE Conference on Decision and Control, New Orleans, USA, December 12-14, 2007.
S.Bhatnagar, A RED algorithm for the Internet, Proceedings of National Conference on Information Technology: Present Practices and Challenges, Aug.31-Sep.1, 2007, New Delhi, India.
S.Bhatnagar, R.S.Sutton, M.Ghavamzadeh and M.Lee, Incremental update natural-gradient actor-critic algorithms, Proceedings of Neural Information Processing Systems (NIPS), Vancouver, Canada, December 3-6, 2007.
M.S.Abdulla and S.Bhatnagar, Solving MDPs using two-timescale simulated annealing with multiplicative weights, Proceedings of American Control Conference (ACC), New York, July 11-13, 2007.
M.S.Abdulla and S.Bhatnagar, Parametrized actor-critic algorithms for finite-horizon MDPs, Proceedings of American Control Conference (ACC), New York, July 11-13, 2007.
V.Sudha, L.Gopal, V.Sridhar and S.Bhatnagar, Fuzzy clustering based Ad recommendation for TV programs, Proceedings of fifth European Conference, EuroITV, Amsterdam, Netherlands, 2007.
S.Bhatnagar and K.M.Babu, Two-Timescale Q-Learning Algorithms with an Application to Routing in Networks, Proceedings of International Conference on Advances in Control and Optimization of Dynamical Systems (ACODS), Bangalore, 2007.
Diksha Sharma and S.Bhatnagar, Optimal parameterized policies for resource allocation in communication networks, Proceedings of IEEE International Conference on Signal and Image Processing, Hubli, Karnataka, 2006.
V.L.Raju Chinthalapati and S.Bhatnagar, A simultaneous deterministic perturbation actor-critic algorithm with an application to optimal mortgage refinancing, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA, 2006.
S.Bhatnagar and M.S.Abdulla, An actor-critic algorithm for finite horizon Markov decision processes, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA, 2006.
R.Patro and S.Bhatnagar, A four-timescale algorithm for constrained stochastic optimization of RED, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA, 2006.
M.S.Abdulla and S.Bhatnagar, SPSA with measurement reuse, Proceedings of Winter Simulation Conference, Monterey, CA, USA, 2006.
Diksha Sharma, S.Bhatnagar and S.Chakraborty, An Algorithm for Dynamic Optimal Bandwidth Allocation in Communication Networks, Proceedings of Fifth Asia Pacific International Symposium on Information Technology (APIS5), pp.489-492, Hangzhou, China, 2006.
M.S.Abdulla and S.Bhatnagar, Solution of MDPs using simulation based value iteration, Proceedings of Second IFIP Conference on Artificial Intelligence Applications and Innovations, pp.749-759, Beijing, China, 2005.
A.Dukkipati, M.N.Murty and S.Bhatnagar, Properties of Kullback-Leibler cross-entropy minimization in nonextensive framework, Proceedings of IEEE International Symposium on Information Theory, pp.2374-2378, Adelaide, Australia, 2005.
A.Dukkipati, M.N.Murty and S.Bhatnagar, Information theoretic justification of Boltzmann selection and its generalization to Tsallis case, Proceedings of IEEE Congress on Evolutionary Computation, pp.1667-1674, Vol.2, Edinburgh, U.K., 2005.
S.Bhatnagar and S.Kumar, A reinforcement learning based algorithm for Markov decision processes, Proceedings of International Conference on Intelligent Sensing and Information Processing (ICISIP), pp.199-204, Chennai, India, 2005.
P.Viswanath, M.N.Murty and S.Bhatnagar, A pattern synthesis technique to reduce the curse of dimensionality effect, Proceedings of International Conference on Knowledge Based Computer Systems (KBCS), pp.219-228, Hyderabad, India, 2004.
R.Vaidya and S.Bhatnagar, Correlation based optimization of random early detection, Proceedings of IEEE INDICON, pp.47-51, Kharagpur, India, 2004.
R.Vaidya and S.Bhatnagar, Optimized RIO for diffserv networks, Proceedings of Information and Computer Science (ICICS), pp.227-240, Dhahran, Saudi Arabia, 2004.
J.R.Panigrahi and S.Bhatnagar, Hierarchical decision making in semi-conductor fabs using multi-time scale Markov decision processes, Proceedings of IEEE Conference on Decision and Control (CDC), pp.4387-4392, Vol.4, Paradise Island, Nassau, Bahamas, 2004.
P.Viswanath, M.N.Murty and S.Bhatnagar, A pattern synthesis technique with an efficient nearest neighbor classifier for binary pattern recognition, Proceedings of International Conference on Pattern Recognition (ICPR), pp.416-419, Vol.4, Cambridge, U.K., 2004.
A.Dukkipati, M.N.Murty and S.Bhatnagar, Cauchy annealing schedule: an annealing schedule for Boltzmann selection scheme in evolutionary algorithms, Proceedings of IEEE Congress on Evolutionary Computation, pp.55-62, Vol.1, Portland, Oregon, USA, 2004.
A.Dukkipati, M.N.Murty and S.Bhatnagar, Quotient evolutionary space: abstraction of evolutionary process w.r.t macroscopic properties, Proceedings of IEEE Congress on Evolutionary Computation, pp.846-853, Vol.2, Canberra, Australia, 2003.
P.Viswanath, M.N.Murty and S.Bhatnagar, Synthetic patterns for nearest neighbour classifier design, Proceedings of Knowledge Based Computer Systems (KBCS), pp.323-332, Mumbai, India, 2002.
P.Viswanath, M.N.Murty and S.Bhatnagar, An efficient classifier: using a compact tree structure and novel pattern synthesis, Proceedings of HPC Asia Conference, pp.395-398, Bangalore, India, 2002.
S.Bhatnagar, E.Fernandez-Gaucherand, M.C.Fu, Y.He and S.I.Marcus, A Markov decision process model for capacity expansion and allocation, Proceedings of 38th IEEE Conference on Decision and Control, pp.1156-1161, Phoenix, Arizona, 1999.
S.Bhatnagar, M.C.Fu and S.I.Marcus, Two timescale SPSA algorithms for rate-based ABR flow control, Advances in System Theory Symposium (in honour of Sanjoy K.Mitter on his 65th birthday), Cambridge, USA, October, 1999.
S.Bhatnagar, M.C.Fu and S.I.Marcus, Rate based ABR flow control using two timescale SPSA, Proceedings of SPIE Conference on Performance and Control of Network Systems III, pp.142-149, Boston, September, 1999.
S.Bhatnagar, M.C.Fu, S.I.Marcus, and Y.He, Markov decision processes for semiconductor fab-level decision making, Proceedings of IFAC 14th Triennial World Congress, Beijing, China, pp.145-150, 1999.
S.Bhatnagar and V.Sharma, Optimal control of a feedback queue via stochastic approximation, Proceedings of IEEE Globecom98, Sydney, Australia, November, 1998.
S.Bhatnagar and V.S.Borkar, Infinitesimal perturbation analysis: an overview and recent trends, Proceedings of the sixth IEEE symposium on intelligent systems, Bangalore, 1997.
S.Bhatnagar and V.H.Gupta, Using stochastic control to save fuel in automobiles, Proceedings of the fifth IEEE symposium on intelligent systems, Bangalore, pp.55-61, Nov.1996.

Technical Reports

S.Bhatnagar, R.S.Sutton, M.Ghavamzadeh and M.Lee, Natural Actor-Critic Algorithms, Technical Report, Department of Computing Science, University of Alberta, Canada, 2009 2009TR09-10.php.
H.L.Prasad, S.Bhatnagar and N.Hemachandra, A computational procedure for general-sum stochastic games, Technical Report IISc-CSA-TR-2009-5, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2009.
K.R.Vemu, S.Bhatnagar and N.Hemachandra, Link-route pricing for optimal QoS, Technical Report IISc-CSA-TR-2007-8, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2007.
M.S.Abdulla and S.Bhatnagar, Reinforcement learning based algorithms for average cost Markov decision processes, Technical Report IISc-CSA-TR-2005-6, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2005.
S.Bhatnagar, A simulation based algorithm for Markov decision processes, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2002.
Y.He, S.Bhatnagar, M.C.Fu and S.I.Marcus, Approximate policy iteration for semiconductor fab-level decision making - a case study, Institute for Systems Research, University of Maryland, URL: http:www.isr.umd.eduTechReports ISR2000TR _2000-49/, 2000.
S.Bhatnagar, M.C.Fu, S.I.Marcus and S.Bhatnagar, Randomized difference two-timescale simultaneous perturbation stochastic approximation algorithms for simulation optimization of hidden Markov models, Institute for Systems Research, University of Maryland, URL: http:www.isr.umd.eduTechReports ISR2000 TR_2000-13/, 2000.
S.Bhatnagar, M.C.Fu and S.I.Marcus, Optimal multilevel policies for ABR flow control using two timescale SPSA, Institute for Systems Research, University of Maryland, URL: http:www.isr.umd.eduTechReportsISR1999TR_99-18/, 1999.