Publications (All)


Books/Monographs


  1. S.Bhatnagar, H.L.Prasad and L.A.Prashanth, Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods, Lecture Notes in Control and Information Sciences Series, Vol. 434, Springer, ISBN 978-1-4471-4284-3, Edition: 2013, 302 pages.


Book Chapters


  1. A.G.Joseph and S.Bhatnagar, An Incremental Fast Policy Search Using a Single Sample Path, Shankar B., Ghosh K., Mandal D., Ray S., Zhang D., Pal S. (Eds) Pattern Recognition and Machine Intelligence, Lecture Notes in Computer Science, vol 10597. Springer, 2017 online pdf

  2. S.Bhatnagar, V.S.Borkar and Prashanth L.A., Adaptive feature pursuit: Online adaptation of features in reinforcement learning, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (Ed. F. Lewis and D. Liu), IEEE Press Computational Intelligence Series (jointly published by IEEE Press and Wiley), Chapter 23, pp. 517-534, 2013 online pdf.

  3. S.Bhatnagar, Simultaneous perturbation and finite difference methods, Wiley Encyclopedia of Operations Research and Management Science (Ed. J. Cochran), Vol. 7, pp. 4969-4991, Wiley, Hoboken, NJ, 2011 online pdf.

  4. P.Viswanath, M.N.Murty and S.Bhatnagar, Pattern synthesis for non-parametric pattern recognition, Encyclopedia of Data Warehousing and Mining, second edition, Ed. J. Wang, Montclair State University, USA, Published by Idea group inc.,USA, 2008.

  5. V.Sudha, L.Gopal, V.Sridhar and S.Bhatnagar, Fuzzy clustering based Ad recommendation for TV programs, Interactive TV: A Shared Experience, Eds. P.Cesar, K.Chorianopoulos and J.F.Jensen, Springer, pp.175-184, 2007.

  6. P.Viswanath, M.N.Murty and S.Bhatnagar, Pattern synthesis for large-scale pattern recognition, Encyclopedia of Data Warehousing and Mining, Ed. . J. Wang, Montclair State University, USA, Published by Idea group inc.,USA, 2005, pp. 902-905.

  7. S.Bhatnagar, M.Fu and S.I.Marcus, Two timescale SPSA algorithms for rate-based ABR flow control, Chapter 27, System Theory: Modeling, Analysis and Control, Ed. T.Djaferis and I.Schick, Kluwer Academic, Cambridge, Massachussets, pp.367-378, 1999.


Journal Papers


  1. A.Ramaswamy and S.Bhatnagar, Analyzing approximate value iteration algorithms, Mathematics of Operations Research, Vol.47, No.3, pp. 2138-2159, 2022 online pdf arXiv

  2. D.R.Bharadwaj, Chandramouli K., and S.Bhatnagar, A generalized minimax Q-learning algorithm for two-player zero-sum stochastic games, IEEE Transactions on Automatic Control, Vol. 67, No. 9, pp. 4816-4823, 2022 online pdf arXiv

  3. Chandramouli K., D.R.Bharadwaj, and S.Bhatnagar, Generalized Second Order Value Iteration in Markov Decision Processes, IEEE Transactions on Automatic Control, Vol. 67, Issue 8, pp. 4241-4247, 2022 online pdf arXiv

  4. P.Karmakar and S.Bhatnagar, Stochastic approximation with iterate-dependent Markov noise under verifiable conditions in compact state space with the stability of iterates not ensured, IEEE Transactions on Automatic Control, Vol.66, Issue 12, pp. 5941-5954, Dec 2021 online pdf arXiv

  5. A.Ramaswamy, S.Bhatnagar and D.Quevedo, Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning, IEEE Transactions on Automatic Control, Vol. 66, Issue 9, pp. 3969-3983, Sep 2021 online pdf arXiv

  6. K.J.Prabuchandran, S.Penubothula, Chandramouli K., and S.Bhatnagar, Novel first order Bayesian optimization with an application to reinforcement learning, Applied Intelligence, Springer, Vol. 51, pp. 1565-1579, 2021 online pdf

  7. P.Karmakar and S.Bhatnagar, On tight bounds for function approximation error in risk-sensitive reinforcement learning, Systems and Control Letters, Vol. 150, 104899:1-7, April 2021 online pdf

  8. A.Singla, Sindhu P.R., and S.Bhatnagar, Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge, IEEE Transactions on Intelligent Transportation Systems, Vol.22, No.1, pp.107-118, January 2021 online pdf arXiv

  9. V.G.Yaji and S.Bhatnagar, Stochastic Recursive Inclusions in Two Timescales with Non-additive Iterate-dependent Markov Noise, Mathematics of Operations Research, Vol. 45, No.4, pp. 1405-1444, November 2020 online pdf arXiv

  10. Sindhu P.R., Prabuchandran K.J., aqnd S.Bhatnagar, Reinforcement Learning Algorithm for Non-Stationary Environments, Applied Intelligence, Springer, Vol.50, pp.3590-3606, 2020 online pdf arXiv

  11. I.John, Chandramouli K., and S.Bhatnagar, Generalized Speedy Q-learning, IEEE Control Systems Letters, Vol.4, Issue 3, July 2020 online pdf

  12. Prashanth L.A., S.Bhatnagar, N.Bhavsar, M.Fu and S.Marcus, Random directions stochastic approximation with deterministic perturbations, IEEE Transactions on Automatic Control, Vol. 65, Issue 6, pp. 2450-2465, June 2020 online pdf arXiv

  13. V.G.Yaji and S.Bhatnagar, Analysis of Stochastic Approximation Schemes with Set-valued Maps in the Absence of a Stability Guarantee and their Stabilization, IEEE Transactions on Automatic Control, Vol. 65, Issue 3, pp. 1100-1115, March 2020 online pdf arXiv

  14. Chandramouli K., D.R.Bharadwaj and S.Bhatnagar, Successive Over-Relaxation Q-Learning, IEEE Control Systems Letters (L-CSS), Vol. 4, Issue 1, pp. 55-60, Jan 2020 online pdf arXiv

  15. Chandramouli K., D.R.Bharadwaj, Prabuchandran K.J., and S.Bhatnagar, An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms, IEEE Control Systems Letters (L-CSS), Vol. 3, Issue 3, pp. 697-702, July 2019 online pdf arXiv

  16. A.Ramaswamy and S.Bhatnagar, Stability of Stochastic Approximations with ‘Controlled Markov’ Noise and Temporal Difference Learning, IEEE Transactions on Automatic Control, Vol. 64, Issue 6, pp. 2614-2620, June 2019 online pdf

  17. A.G.Joseph and S.Bhatnagar, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Machine Learning, Vol. 107, Issue 8–10, pp.1385–1429, 2018 online pdf arXiv

  18. D.R.Bharadwaj, K.J.Prabuchandran, and S.Bhatnagar, Novel sensor scheduling scheme for intruder tracking in energy efficient sensor networks, IEEE Wireless Communication Letters, Vol. 7, Issue 5, pp. 712-715, Oct 2018 online pdf

  19. A.G.Joseph and S.Bhatnagar, An incremental off-policy search in a model-free Markov decision process using a single sample path, Machine Learning, Vol.107, Issue 6, pp. 969–1011, 2018 online pdf arXiv

  20. A.Ramaswamy and S.Bhatnagar, Analysis of Gradient Descent Methods with Non-Diminishing, Bounded Errors, IEEE Transactions on Automatic Control, Vol. 63, Issue 5, pp.1465–1471, 2018 online pdf arXiv

  21. Chandrashekar L., S.Bhatnagar, and C.Szepesvari, A Linearly Relaxed Approximate Linear Program for Markov Decision Processes, IEEE Transactions on Automatic Control, Vol. 63, Issue 4, pp. 1185–1191, 2018 online pdf arXiv

  22. S.Bhatnagar, S.Patel, and Karmeshu, A Stochastic Approximation Approach to Active Queue Management, Telecommunication Systems (Springer), Vol.68, No.1, pp.89–104, 2018 online pdf

  23. V.G.Yaji and S.Bhatnagar, Stochastic Recursive Inclusions with Non-Additive Iterate-Dependent Markov Noise, Stochastics, Vol. 90, No. 3, pp. 330–363, 2018 online pdf arXiv

  24. E.Zhou and S.Bhatnagar, Gradient-based Adaptive Stochastic Search for Simulation Optimization over Continuous Space, INFORMS Journal on Computing, Vol. 30, No. 1, pp. 154–167, 2018 online pdf

  25. P.Karmakar and S.Bhatnagar, Two Time-scale Stochastic Approximation with Controlled Markov noise and Off-policy Temporal Difference Learning, Mathematics of Operations Research, Vol. 43, No.1, pp. 130–151, 2018 online pdf arXiv

  26. Chandrashekar L. and S.Bhatnagar, A Stability Criterion for Two Timescale Stochastic Approximation Schemes, Automatica, Vol.79, pp.108-114, May 2017 online pdf

  27. L.A.Prashanth, S.Bhatnagar, M.Fu, and S.Marcus, Adaptive system optimization using random directions stochastic approximation, IEEE Transactions on Automatic Control, Vol. 62, Issue 5, pp.2223–2238, 2017 online pdf arXiv

  28. A.Ramaswamy and S.Bhatnagar, A generalization of the Borkar-Meyn theorem for stochastic recursive inclusions, Mathematics of Operations Research, Vol. 42, No. 3, pp. 648–661, 2017 online pdf arXiv

  29. A.Ramaswamy and S.Bhatnagar, Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem, Stochastics, Vol.88, No.8, pp.1173-1187, 2016 online pdf,arXiv

  30. Lakshmanan K. and S.Bhatnagar, Quasi-Newton smoothed functional al gorithms for unconstrained and constrained simulation optimization, Computational Optimization and Applications (Springer), Vol.66, No.3, pp.533-556, 2017 online pdf

  31. Karmeshu, S.Patel, and S.Bhatnagar, Adaptive mean queue size and its rate of change: queue management with random dropping, Telecommunication Systems (Springer), Vol.65, Issue 2, pp.281-295, 2017 online pdf

  32. L.A.Prashanth, H.L.Prasad, S.Bhatnagar and P.Chandra, A constrained optimization perspective on actor critic algorithms and application to network routing, Systems and Control Letters, Vol.92, pp.46-51, 2016 online pdf

  33. Prabuchandran K.J., S.Bhatnagar and V.S.Borkar, Actor Critic Algorithms with Online Feature Adaptation, ACM Transactions on Modeling and Computer Simulation, Vol.26, No.4, pp.24:1-24:26, 2016 online pdf

  34. M.S.Abdulla and S.Bhatnagar, Multi-armed bandits based on a variant of simulated annealing, Indian Journal of Pure and Applied Mathematics (Springer), Special Issue in Honour of Prof.Vivek Borkar's 60th Birthday, Vol.47, Issue 2, pp.195-212, 2016 online pdf

  35. S.Bhatnagar and Lakshmanan K., Multiscale Q-learning with Linear Function Approximation, Discrete Event Dynamic Systems, Vol.26, Issue 3, pp.477-509, 2016 online pdf

  36. V.G.Yaji and S.Bhatnagar, Necessary and sufficient conditions for optimality in constrained general sum stochastic games, Systems and Control Letters, Vol. 85, pp.8-15, 2015 online pdf

  37. Sindhu P.R., Prabuchandran K.J., and S.Bhatnagar, Energy sharing for multiple sensor nodes with finite buffers, IEEE Transactions on Communications, Vol.63, No.5, pp.1811-1823, 2015 online pdf

  38. S.Bhatnagar and Prashanth L.A., Simultaneous Perturbation Newton Algorithms for Simulation Optimization, Journal of Optimization Theory and Applications (Springer), Vol. 164, Issue 2, pp.621-643, 2015 online pdf

  39. Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta, Simultaneous perturbation methods for adaptive labor staffing in service systems, Simulation, Vol. 91, No. 5, pp.432-455, 2015 online pdf

  40. Prashanth L.A, A.Chatterjee and S.Bhatnagar, Two timescale convergent Q-learning for sleep–scheduling in wireless sensor networks, Wireless Networks (Springer), Vol. 20, Issue 8, pp.2589-2604, 2014 online pdf

  41. D.Ghoshdastidar, A.Dukkipati and S.Bhatnagar, Newton based stochastic optimization using q-Gaussian smoothed functional algorithms, Automatica (Elsevier), Vol. 50, No.10, pp.2606-2614, 2014 online pdf

  42. D.Ghoshdastidar, A.Dukkipati and S.Bhatnagar, Smoothed functional algorithms for stochastic optimization using q-Gaussian distributions, ACM Transactions on Modeling and Computer Simulation, Vol. 24, No. 3, pp.17:1–17:26, 2014 online pdf

  43. S. Chakravarty, Sindhu P.R. and S. Bhatnagar, A simulation based algorithm for optimal pricing policy under demand uncertainty, International Transactions in Operational Research (Wiley), Vol.21, Issue 5, pp.737-760, 2014 online pdf

  44. S.Bhatnagar, Smoothed functional algorithms for optimization, Annals of the Indian National Academy of Engineering (INAE), Vol.XI, pp.95-105, April 2014.

  45. S.Bhatnagar, V.S.Borkar and Prabuchandran K.J., Feature search in the Grassmanian in online reinforcement learning, IEEE Journal of Selected Topics in Signal Processing, Vol.7, No.5, pp.746-758, 2013 online pdf

  46. Prabuchandran K.J., S.K.Meena and S.Bhatnagar, Q-learning based energy management policies for a single sensor node with finite buffer, IEEE Wireless Communication Letters, Vol.2, Issue 1, pp.82-85, 2013 online pdf.

  47. H.L.Prasad, L.A.Prashanth, S.Bhatnagar and N.Desai, Adaptive Smoothed Functional Algorithms for Optimal Staffing Levels in Service Systems, Service Science (INFORMS), Vol. 5, Issue 1, pp.29-55, March 2013 online pdf.

  48. L.A.Prashanth and S.Bhatnagar, Threshold tuning using stochastic optimization for graded signal control, IEEE Transactions on Vehicular Technology, Vol. 61, No. 9, pp.3865-3880, November 2012 online pdf.

  49. H.L.Prasad and S.Bhatnagar, General-Sum Stochastic Games: Verifiability Conditions for Nash Equilibria, Automatica, Vol. 48, Issue 11, pp.2923-2930, 2012 online pdf.

  50. K.R.Vemu, S.Bhatnagar and N.Hemachandra, Optimal Multi-layered Congestion Based Pricing Schemes for Enhanced QoS, Computer Networks (Elsevier), Vol.56, Issue 4, pp.1249-1262, March 2012. (DOI: 10.1016/j.comnet.2011.12.004)

  51. S.Bhatnagar and Lakshmanan K., An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes, Journal of Optimization Theory and Applications (Springer), Vol. 153, No. 3, pp.688-708, 2012. (DOI: 10.1007/s10957-012-9989-5)

  52. S.Bhatnagar, V.Mishra and N.Hemachandra, Stochastic algorithms for discrete parameter simulation optimization, IEEE Transactions on Automation Science and Engineering, Vol. 8, Issue 4, pp. 780-793, 2011. (DOI: 10.1109/TASE.2011.2159375)

  53. Karmeshu, S.Bhatnagar and V.Mishra, An optimized SDE model for slotted Aloha, IEEE Transactions on Communications, Vol. 59, No. 6, pp.1502-1508, 2011. (DOI: 10.1109/TCOMM.2011.09.090113)

  54. L.A.Prashanth and S.Bhatnagar, Reinforcement learning with function approximation for traffic signal control, IEEE Transactions on Intelligent Transportation Systems, Vol. 12, No. 2, pp.412-421, 2011. (DOI: 10.1109/TITS.2010.2091408)

  55. S.Bhatnagar, The Borkar-Meyn Theorem for Asynchronous Stochastic Approximations, Systems and Control Letters, Vol. 60, pp. 472-478, 2011. (DOI: 10.1016/j.sysconle.2011.04.002)

  56. S.Bhatnagar and Karmeshu, Monte-Carlo Estimation of Time-Dependent Statistical Characteristics of Random Dynamical Systems, Applied Mathematical Modelling (Elsevier), Vol.35, pp.3063-3079, 2011. (DOI: 10.1016/ j.apm.2010.12.024).

  57. S.Bhatnagar, N.Hemachandra and V.Mishra, Stochastic approximation algorithms for constrained optimization via simulation, ACM Transactions on Modeling and Computer Simulation, Vol. 21, Issue 3, pp:15:1-15:22, 2011.

  58. S.Bhatnagar, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, Systems and Control Letters, Vol. 59, pp.760-766, 2010 (DOI: 10.1016/ j.sysconle.2010.08.013).

  59. G.R.Reddy, S.Bhatnagar, V.Rakesh and V.P.Chaturvedi, An efficient algorithmfor scheduling in bluetooth piconets and scatternets, Wireless Networks (Springer), Vol.16, No.7, pp.1799-1816, 2010 (DOI: 10.1007/ s11276-009-0229-3).

  60. A.Chakraborty and S.Bhatnagar, Optimized policies for the retransmission probabilities in slotted Aloha, Simulation, Vol.86, No.4, pp.247-261, 2010.

  61. S.Bhatnagar and R.K.Patro, A proof of convergence of the B-RED and P-RED algorithms for random early detection, IEEE Communication Letters, Vol.13, No.10, pp.809-811, 2009.

  62. S.Bhatnagar, R.S.Sutton, M.Ghavamzadeh and M.Lee, Natural Actor-Critic Algorithms, Automatica, Vol.45, Issue 11, pp.2471-2482, 2009.

  63. S.Bhatnagar, Karmeshu and V.Mishra,  Optimal Parameter Trajectory Estimation in Parameterized SDEs: An Algorithmic Procedure, ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol. 19, No. 2, pp. 8:1-8:27, 2009.

  64. R.K.Patro and S.Bhatnagar, A Probabilistic Constrained Nonlinear Optimization Framework to Optimize RED parameters, Performance Evaluation, Vol. 66, Issue 2, pp.81-104, 2009.

  65. S.Bhatnagar and M.S.Abdulla,  Simulation-based optimization algorithms for finite horizon Markov decision processes, Simulation, Vol. 84, No. 12, pp. 577-600, 2008.

  66. V.Sudha, L.Gopal, S.Bhatnagar and V.Sridhar, A novel Ad-recommendation system for TV programs Springer/ACM Multimedia Systems Journal, Vol.14, No.2, pp.73-87, 2008.

  67. C. Vignat and S. Bhatnagar, An extension of Wick's Theorem, Statistics and Probability Letters, Vol.78, Issue 15, pp.2404-2407, 2008.

  68. S.Bhatnagar and K.M.Babu, New algorithms of the Q-learning type, Automatica, Vol.44, Issue 4, pp.1111-1119, 2008.

  69. S.Bhatnagar, An adaptive multivariate three-timescale smoothed functional algorithm for simulation optimization, ACM Transactions on Modeling and Computer Simulation, Vol.18, No.1, pp.2:1-2:35, December 2007.

  70. A.Dukkipati, S.Bhatnagar and M.N.Murty, Gelfand-Yaglom-Perez theorem for generalized relative entropy functionals, Information Sciences, Vol.177, pp.5707-5714, 2007.

  71. A.Dukkipati, S.Bhatnagar and M.N.Murty, On measure-theoretic aspects of nonextensive entropy functionals and corresponding maximum entropy prescriptions, Physica A, Vol.384, pp.758-774, 2007.

  72. M.S.Abdulla and S.Bhatnagar, ‘Reinforcement learning based algorithms for average cost Markov decision processes’, Discrete Event Dynamic Systems, Vol.17, No.1, pp.23-52, 2007.

  73. S.Bhatnagar, V.S.Borkar and A.Madhukar, A simulation based algorithm for ergodic control of Markov chains conditioned on rare events, Journal of Machine Learning Research, Vol.7, pp.1937-1962, 2006.

  74. R.Vaidya and S.Bhatnagar, Robust optimization of random early detection, Telecommunication Systems, Vol.33, No.4, pp.291-316, 2006.

  75. P.Viswanath, M.N.Murty and S.Bhatnagar, Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification, Pattern Recognition Letters, 27, pp.1714-1724, 2006.

  76. S.Bhatnagar and J.R.Panigrahi, Actor-critic algorithms for hierarchical Markov decision processes, Automatica, Vol.42, Issue 4, pp.637-644, 2006.

  77. A.Dukkipati, M.N.Murty and S.Bhatnagar, Nonextensive triangle equality and other properties of Tsallis relative-entropy minimization, Physica A, Vol.361, pp.124-138, 2006.

  78. S.Bhatnagar and H.J.Kowshik, A discrete parameter stochastic approximation algorithm for simulation optimization, Simulation: Transactions of the Society for Modeling and Simulation International, Vol.81, No.11, pp.757-772, 2005.

  79. P.Viswanath, M.N.Murty and S.Bhatnagar, Overlap pattern synthesis with an efficient nearest neighbour classifier, Pattern Recognition, Vol.38, pp.1187-1195, 2005.

  80. S.Bhatnagar and I.B.B.Reddy, Optimal threshold policies for admission control in communication networks via discrete parameter stochastic approximation, Telecommunication Systems, Vol.29, No.1, pp.9-31, 2005.

  81. S.Bhatnagar, Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization, ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol.15, No.1, pp.74-107, January 2005.

  82. P.Viswanath, M.N.Murty and S.Bhatnagar, Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification, Information Fusion, Vol. 5, pp.239-250, 2004.

  83. S.Bhatnagar and S.Kumar, A simultaneous perturbation stochastic approximation based actor-critic algorithm for Markov decision processes, IEEE Transactions on Automatic Control, Vol. 49, Number 4, pp.592-598, April 2004.

  84. S.Bhatnagar and V.S.Borkar, Multiscale chaotic SPSA and smoothed functional algorithms for simulation optimization, Simulation: Transactions of the Society for Modeling and Simulation International, Vol. 79, Issue 10, pp.568-580, 2003.

  85. S.Bhatnagar, M.C.Fu, S.I.Marcus and I-J.Wang, Two timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences, ACM Transactions on Modelling and Computer Simulation, Vol. 13, No. 2, pp.180-209, 2003.

  86. X.-R.Cao, R.Zhiyuan, S.Bhatnagar, M.C.Fu and S.I.Marcus, A time aggregation approach to Markov decision processes, Automatica, Vol. 38, No. 6, 929-943, 2002.

  87. S.Bhatnagar, M.C.Fu, S.I.Marcus and P.J.Fard, Optimal structured feedback policies for ABR flow control using two timescale SPSA, IEEE/ACM Transactions on Networking, Vol.9, No.4, pp.479-491, 2001.

  88. S.Bhatnagar, M.C.Fu, S.I.Marcus and S.Bhatnagar, Two timescale algorithms for simulation optimization of hidden Markov models, IIE Transactions (Pritsker special issue on simulation), Vol.3, pp.245-258, 2001.

  89. S.Bhatnagar and V.S.Borkar, A two time scale stochastic approximation scheme for simulation based parametric optimization, Probability in the Engineering and Informational Sciences, Vol.12, pp.519-531, 1998.

  90. V.H.Gupta and S.Bhatnagar, An optimal fuel-injection policy for performance enhancement in internal combustion (I.C.) engines, Sadhana (Indian Academy of Sciences), Vol.22, Part 4, pp.545-552, 1997.

  91. S.Bhatnagar and V.S.Borkar, Multiscale stochastic approximation for parametric optimization of hidden Markov models, Probability in the Engineering and Informational Sciences, Vol.11, pp.509-522, 1997.

  92. S.Bhatnagar and V.S.Borkar, A convex analytic framework for ergodic control of semi-Markov processes, Mathematics of Operations Research, Vol.20, No.4, pp.923-936, 1995.


Preprints Submitted to journals


Our recent papers on arXiv can be found here


Proceedings of International Conferences

  1. S.Bhatnagar and L.A.Prashanth, Generalized simultaneous perturbation stochastic approximation with reduced estimator bias, 57th Annual Conference on Information Sciences and Systems (CISS), Invited Paper, Baltimore, Maryland, March 22-24, 2023

  2. A.K.Jayant and S.Bhatnagar, Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm, NeurIPS 2022, New Orleans, Louisiana, USA, Nov 28 to Dec 04, 2022 arXiv

  3. R.Deb, M.Gandhi, and S.Bhatnagar, Schedule Based Temporal Difference Algorithms, 58th Annual Allerton Conference on Communication, Control, and Computing (IEEE), Monticello, Illinois, USA, Sep 27 to 30, 2022 Online PDF (Invited Paper)

  4. Sindhu P.R., Prabuchandran K.J., S. Ganguly, and S.Bhatnagar, Data Efficient Safe Reinforcement Learning, IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, October 9-12, 2022

  5. D.R.Bharadwaj, P.Jain, Prabuchandran K.J., and S.Bhatnagar, Neural network compatible off-policy natural actor-critic algorithm, Int ernational Joint Conference on Neural Networks (IJCNN), Padova, Italy, July 18-23, 2022 arXiv (Best Student Paper Award)

  6. U.A.Mishra, S.R.Samineni, P.Goel, C.Kunjeti, H.Lodha, A.Singh, A.Sagi, S.Bhatnagar and S.Kolathaya, Dtnamic mirror descent based model predictive control for accelerating robot learning, IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, May 23-27, 2022 arXiv

  7. R.Deb and S.Bhatnagar, Gradient Temporal Difference with Momentum: Stability and Convergence, AAAI Conference on Artificial Intelligence, Vancouver, Feb 22 - Mar 01, 2022 arXiv

  8. Priya S. and S.Bhatnagar, Robust traffic signal timing control using multiagent twin delayed deep deterministic policy gradients, 14th International Conference on Agents and Artificial Intelligence (ICAART), Online, Feb 3-5, 2022

  9. P.Parnika, D.R.Bharadwaj, D.S.K.Reddy and S.Bhatnagar, Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning, AAMAS (Extended Abstract), Virtual Event, May 3-7, 2021

  10. K.Paigwar, L.Krishna, S.Tirumala, N.Khetanm A.Sagi, A.Joglekar, S.Bhatnagar, A.Ghosal, B.Amrutur, and S.Kolathaya, Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach, Conference on Robot Learning (CoRL), Virtual Event, November 16-18, 2020

  11. S.Tirumala, S.G.Venkatesh, K.Paigwar, A.V.Sagi, A.Joglekar, S.Bhatnagar, B.Amrutur and S.N.Y.Kolathaya, Learning Stable Manoevtres for Quadruped Robots from Expert Demonstrations, 29th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man), Naples, Italy, Aug.31-Sep.04, 2020

  12. S.Nayak, C.A.Ekbote, A.P.S.Chauhan, D.R.Bharadwaj, P.Ray, A.Sikdar, D.S.K.Reddy, and S.Bhatnagar, Stochastic Game Framework for Efficient Energy Management in Microgrid Networks, IEEE PES Innovative Smart Grid Technologies Conference, The Hague, Netherlands, Oct. 25-28, 2020 arXiv

  13. Sindhu P.R., S. Rao, and S.Bhatnagar, Learning-Based Resource Allocation in Industrial IoT Systems, IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, London, UK, Aug. 31 - Sep. 3, 2020

  14. I.John and S.Bhatnagar, Deep Reinforcement Learning with Successive Over-Relaxation and its Application in Auto-scaling Cloud Resources, International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, July 19-24, 2020

  15. D.R.Bharadwaj, Chandramouli K., and S.Bhatnagar, A convergent off-policy temporal difference algorithm, European Conference on Artificial Intelligence (ECAI), Santiago de Compostela, Spain, June 8-12, 2020

  16. I.John, R.Karumanchi and S.Bhatnagar, Predictive and prescriptive analytics for performance optimization: framework and a case study on a large-scale enterprise system, IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, Florida, Dec 16-19, 2019

  17. A.Dharmavaram, M.Riemer and S.Bhatnagar, Hierarchical average reward policy gradient algorithms, AAAI-20 Student Abstract and Poster Program, (to appear in Proceedings of AAAI 2020), New York, Feb 7-12, 2020

  18. A.G.Joseph and S.Bhatnagar, An incremental algorithm for estimating extreme quantiles, Indian Control Conference (IEEE), pp. 286-291, Hysderabad, Dec. 18-20, 2019 online pdf

  19. A.G.Joseph and S.Bhatnagar, An Adaptive and Incremental Approach to Quantile Estimation, IEEE Conference on Decision and Control, Nice, France, Dec 11-13, 2019 (to appear in Proceedings of CDC - to be published in IEEE XPlore)

  20. Chandramouli K., D.R.Bharadwaj, Prabuchandran K.J., and S.Bhatnagar, An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms, IEEE Conference on Decision and Control, Nice, France, Dec 11-13, 2019 (to appear in IEEE Control Systems Letters) online pdf arXiv

  21. S.Kolathaya, D.Dholakiya, S.Bhatnagar, A.Singla, S.Bhattacharya, A.Ghosal, B.Amrutur, A.Singh, A.Joglekar, A.V.Sagi, S.Shetty, and A.Gunalan, Trajectory Based Deep Policy Search for Quadrupedal Walking, 28th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man 2019), New Delhi, Oct 14-18, 2019

  22. S.Bhattacharya, A.Singla, A.Singh, D.Dholakiya, S.Bhatnagar, B.Amrutur, A.Ghosal, and S.Kolathaya, Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots, 28th IEEE International Conference on Robot and Human Interactive Communication (Ro-Man 2019), New Delhi, Oct 14-18, 2019 arXiv

  23. A.G.Joseph and S.Bhatnagar, Stochastic approximation trackers for model-based search, 57th Annual Allerton Conference on Communication, Control and Computing, Urbana, Illinois, Sep 24-27, 2019

  24. A.Singla, S.Bhattacharya, D.Dholakiya, S.Bhatnagar, A.Ghosal, B.Amrutur and S.Kolathaya, Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives, IEEE International Conference on Robotics and Automation (ICRA), Montreal, 2019 (Accepted) arXiv

  25. D.R.Bharadwaj, D.S.K.Reddy, K.J.Prabuchandran and S.Bhatnagar, Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning, 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, pp.1931-1933, 2019 online pdf

  26. D.Dholakia, S.Bhattacharya, A.Gunalan, A.Singla, S.Bhatnagar, B.Amrutur, S.Kolathaya and A.Ghosal, Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch, 5th International Conference on Control, Automation and Robotics (ICCAR), Beijing, April 19-22, 2019 (Accepted) arXiv

  27. Indu John and S.Bhatnagar, Efficient budget allocation and task assignment in crowdsourcing, CoDS-COMAD, Kolkata, pp.318-321, Kolkata, Jan 3-5, 2019 (Special Mention Award in the Young Researchers’ Symposium to Indu John) online pdf

  28. A.G.Joseph and S.Bhatnagar, An Adaptive Sampling Algorithm for Policy Evaluation, Fifth IEEE Indian Control Conference (ICC), IIT Delhi, pp.2-9, Jan 9-11, 2019 online pdf

  29. N.Karanjkar, M.Desai and S.Bhatnagar, A simulation-based technique for continuous-space embedding of discrete-parameter queueing systems, European Simulation and Modeling Conference, Ghent, Belgium, Oct 24-26, 2018 (accepted)

  30. D.R.Bharadwaj, D.S.K.Reddy, K.Narayanam and S.Bhatnagar, A unified decision making framework for supply and demand management in microgrid networks, IEEE SmartGridComm, Aalborg, Denmark, Oct 29-Nov 1, 2018 online pdf

  31. Chandramouli K., Prabuchandran K.J., D.S.K.Reddy and S.Bhatnagar, Generalized Deterministic Perturbations For Stochastic Gradient Search, IEEE Conference on Decision and Control, pp. 5734-5739, Fontainebleau, Miami Beach, FL, USA, December 17-19, 2018, online pdf

  32. S.Kumar, Sindhu P.R., Chandrashekar L., P.Parihar, K.Gopinath and S.Bhatnagar, Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach, IEEE Cloud, Honolulu, Hawaii, June 25-30, 2017

  33. A.G.Joseph and S.Bhatnagar, A Model based Search Method for Prediction in Model-free Markov Decision Process, Proceedings of International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska, May 14-19, 2017

  34. A.G.Joseph and S.Bhatnagar, Bounds for Off-policy Prediction in Reinforcement Learning , Proceedings of International Joint Conference on Neural Networks (IJCNN), Anchorage, Alaska, May 14-19, 2017

  35. D.Saikoti Reddy, L.A.Prashanth, and S.Bhatnagar, Improved Hessian estimation for adaptive random directions stochastic approximation, Proceedings of IEEE Conference on Decision and Control (CDC), Las Vegas, NV, Dec 12-14, 2016

  36. A.G.Joseph and S.Bhatnagar, Revisiting the Cross Entropy Method with Applications in Stochastic Global Optimization and RL (Full Paper), Proceedings of European Conference on Artificial Intelligence (ECAI), The Hague, Netherlands, Aug.29-Sep.02, 2016

  37. R.K.Maity, Chandrashekar L., Sindhu P.R., and S.Bhatnagar, Shaping Proto-Value Functions using Rewards (Short Paper), Proceedings of European Conference on Artificial Intelligence (ECAI), The Hague, Netherlands, Aug.29-Sep.02, 2016

  38. A.G.Joseph and S.Bhatnagar, A Randomized Algorithm for Continuous Optimization, Proceedings of Winter Simulation Conference (WSC), Arlington, Virginia, USA, Dec. 11-14, 2016

  39. B.N.Ranganath and S.Bhatnagar, Scalable Focussed Entity Resolution, Proceedings of the International Joint Conference on Neural Networks (IJCNN), IEEE Press, Vancouver, Canada, July 25-29, 2016

  40. A.G.Joseph and S.Bhatnagar, A stochastic approximation algorithm for t he problem of quantile estimation, Proceedings of 22nd International Conference on Neural Information Processing (ICONIP), Istanbul, Turkey, Nov.9-12, 2015 (to appear)

  41. Prasad H.L., Prashanth L.A., and S.Bhatnagar, Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games, Proceedings of 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Istanbul, Turkey, pp.1371-1379, May 4-8, 2015

  42. Chandrashekar L. and S.Bhatnagar, A Generalized Reduced Linear Program for Markov Decision Processes, Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), Austin, Texas, USA, Jan 25-30, 2015 (to appear)

  43. M.S.Abdulla and S.Bhatnagar, A Transitions-only algorithm for Compact Action Set Markov Decision Processes, Proceedings of Indian Control Conference (ICC), IEEE, Chennai, Jan 5-7, 2015 (to appear)

  44. M.S.Abdulla and S.Bhatnagar, Stochastic multi-armed bandit algorithms based on simulated annealing, Proceedings of Indian Control Conference (ICC), IEEE, Chennai, Jan 5-7, 2015 (to appear)

  45. H.Yao, C.Szepesvari, R.Sutton, J.Modayil and S.Bhatnagar, Universal option models, Advances in Neural Information processing Systems (NIPS), pp.990-998, Dec. 8-11, 2014, Montreal, Canada

  46. Prabuchandran, K.J., Hemanth Kumar A.N. and S.Bhatnagar, Multi-agent reinforcment learning for traffic signal control, Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems, pp.2529-2534, Qingdao, China, Oct. 9-11, 2014

  47. Chandrashekar L., A.Dubey, S.Bhatnagar and C.Balamurugan, A Markov decision process framework for predictable job completion times on crowdsourcing platforms, Proceedings of HCOMP, pp.34-35, Pittsburgh, Nov. 2-4, 2014

  48. Chandrashekar L. and S.Bhatnagar, Max-plus methods for optimal control and zero-sum games, Proceedings of the IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 15-17, 2014 (to appear)

  49. Prabuchandran K.J., S.Bhatnagar and V.S.Borkar, An actor-critic algorithm based on Grassmanian search, Proceedings of the IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 15-17, 2014 (to appear)

  50. E.Zhou, S.Bhatnagar and X.Chen, Simulation optimization via gradient-based stochastic search, Proceedings of the Winter Simulation Conference, pp.3869-3879, Savannah, GA, Dec. 7-10, 2014

  51. Prashanth L.A., A. Chatterjee and S.Bhatnagar, Adaptive sleep-wake control using reinforcement learning in sensor networks, Proceedings of International Conference on Communication Systems and Networks (COMSNETS), IEEE, pp.1-8, Jan 6-10, 2014, Bangalore online pdf.

  52. Prashanth L.A., Prasad H.L., N.Desai, S.Bhatnagar, Mechanisms for Hostile Agents with Capacity Constraints, Proceedings of Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS2013), Ito, Jonker, Gini, and Shehory (eds.), pp. 659-666, Saint Paul, Minnesota, May 6-10, ISBN: 978-1-4503-1993-5, 2013 online pdf.

  53. K.Laskshmanan and S.Bhatnagar, A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes, Proceedings of the Fiftieth Annual Allerton Conference on Communication, Control and Computing (IEEE Press), UIUC, Illinois, pp.400-405, ISBN: 978-1-4673-4537-8, 2012.

  54. D.Ghoshdastidar, A.Dukkipati and S.Bhatnagar, q-Gaussian based smoothed functional algorithms for stochastic optimization, Proceedings of IEEE International Symposium on Information Theory (ISIT’2012), pp. 1059-1063, E-ISBN: 978-1-4673-2578-3, July 1-6, 2012.

  55. Prashanth L.A., H.L.Prasad, N.Desai, S.Bhatnagar and G.Dasgupta, Stochastic optimization for adaptive labor staffing in service systems, Proceedings of 9th International Conference on Service Oriented Computing (ICSOC), Cyprus, Dec 5-8, 2011, Published in Service Oriented Computing, LNCS, Vol. 7084, pp.487-494, 2011.

  56. Prashanth L.A. and S.Bhatnagar, Reinforcement Learning with Average Cost for Adaptive Control of Traffic Lights at Intersections, Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, DC, pp. 1640-1645 (ISBN: 978-1-4577-2198-4), October 5-7, 2011.

  57. K.Lakshmanan and S.Bhatnagar, Smoothed functional and Quasi-Newton algorithms for routing in multi-stage queueing network with constraints, Proceedings of ICDCIT (Distributed Computing and Internet Technology, Lecture Notes in Computer Science, Vol. 65362011, pp.175-186, DOI: 10.1007978-3-642-19056-8_12), Feb.9-12, 2011, Bhubaneswar, India.

  58. H.R.Maei, C.Szepesvari, S.Bhatnagar and R.S.Sutton, Toward Off-Policy Learning Control with Function Approximation, Proceedings of ICML, 2010.

  59. L.A.Prashanth and S.Bhatnagar, Control of traffic lights at junctions using reinforcement learning, Proceedings of Workshop on Computer Aided Transportation Planning and Traffic Engineering, pp.129-138, Dec.7-11, 2009, Bangalore, 2009.

  60. H.Yao, R.S.Sutton, S.Bhatnagar and C.Szepesvari, Multi-Step Dyna Planning for Policy Evaluation and Control, Proceedings of NIPS, 2009.

  61. H.R.Maei, C.Szepesvari, S.Bhatnagar, D.Precup and R.S.Sutton, Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation, Proceedings of NIPS , 2009.

  62. H.Yao, S.Bhatnagar and C.Szepesvari, LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS, Proceedings of IEEE Conference on Decision and Control, Shanghai, 2009.

  63. H.Yao, R.Sutton, S.Bhatnagar, D.Diao and C.Szepesvari, Dyna(k): A multi-step Dyna planning, Proceedings of the ICMLUAICOLT Workshop on Abstraction in Reinforcement Learning, Montreal, 2009.

  64. H.Yao, S.Bhatnagar and C.Szepesvari, Temporal difference learning by direct preconditioning’, Multidisciplinary Symposium on Reinforcement Learning (MSRL), Montreal, 2009

  65. R.S.Sutton, H.R.Maei, D.Precup, S.Bhatnagar, D.Silver, C.Szepesvari and E.Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the International Conference on Machine Learning (ICML), Montreal, 2009.

  66. G.R.Reddy and S.Bhatnagar, An efficient and optimized bluetooth scheduling algorithm for scatternets, Proceedings of IEEE Advanced Networks and Telecommunication Systems (ANTS) Conference, Mumbai, 2008.

  67. S.R.Kolavali and S.Bhatnagar, Ant colony optimization algorithms for shortest path problems, Proceedings of Second Workshop on Network Control and Optimization (NET-COOP) (Published in NET-COOP 2008, Eds. E. Altman and A. Chaintreau, LNCS 5425, pp.37-44, Springer, 2008), September 8-10, 2008, Paris, France.

  68. V.Sudha, S.Bhatnagar, S.V.Basavaraja and V.Sridhar, SPSA based feature relevance estimation for video retrieval, Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP), Queensland, Australia, 2008.

  69. R.Patro and S.Bhatnagar,  An optimal RIO with statistical delay assurances, Proceedings of National Conference on Communications (NCC), Mumbai, February 2-3, 2008.

  70. V.P.Chaturvedi, V.Rakesh and S.Bhatnagar, An efficient and optimized bluetooth scheduling algorithm for piconets, Proceedings of International Conference on Distributed Computing and Internet Technology (Published in Distributed Computing and Internet Technology, Eds.T.Janowski and H.Mohanty, LNCS 4882, pp.135-145, Springer, 2007), December 17-20, 2007, Bangalore, India.

  71. K.R.Vemu, S.Bhatnagar and N.Hemachandra, An optimal weighted-average congestion based pricing scheme for enhanced QoS, Proceedings of International Conference on Distributed Computing and Internet Technology (Published in Distributed Computing and Internet Technology, Eds.T.Janowski and H.Mohanty, LNCS 4882, pp.135-145, Springer, 2007), December 17-20, 2007, Bangalore, India.

  72. M.S.Abdulla and S.Bhatnagar, Network flow-control using asynchronous stochastic approximation, Proceedings of IEEE Conference on Decision and Control, New Orleans, USA, December 12-14, 2007.

  73. K.R.Vemu, S.Bhatnagar and N.Hemachandra, Link route pricing for enhanced QoS, Proceedings of IEEE Conference on Decision and Control, New Orleans, USA, December 12-14, 2007.

  74. V.Mishra, S.Bhatnagar and N.Hemachandra, Discrete Parameter Simulation Optimization Algorithms with Applications to Admission Control with Dependent Service Times, Proceedings of IEEE Conference on Decision and Control, New Orleans, USA, December 12-14, 2007.

  75. S.Bhatnagar, A RED algorithm for the Internet, Proceedings of National Conference on Information Technology: Present Practices and Challenges, Aug.31-Sep.1, 2007, New Delhi, India.

  76. S.Bhatnagar, R.S.Sutton, M.Ghavamzadeh and M.Lee, Incremental update natural-gradient actor-critic algorithms, Proceedings of Neural Information Processing Systems (NIPS), Vancouver, Canada, December 3-6, 2007.

  77. M.S.Abdulla and S.Bhatnagar, Solving MDPs using two-timescale simulated annealing with multiplicative weights, Proceedings of American Control Conference (ACC), New York, July 11-13, 2007.

  78. M.S.Abdulla and S.Bhatnagar, Parametrized actor-critic algorithms for finite-horizon MDPs, Proceedings of American Control Conference (ACC), New York, July 11-13, 2007.

  79. V.Sudha, L.Gopal, V.Sridhar and S.Bhatnagar, Fuzzy clustering based Ad recommendation for TV programs, Proceedings of fifth European Conference, EuroITV, Amsterdam, Netherlands, 2007.

  80. S.Bhatnagar and K.M.Babu, Two-Timescale Q-Learning Algorithms with an Application to Routing in Networks, Proceedings of International Conference on Advances in Control and Optimization of Dynamical Systems (ACODS), Bangalore, 2007.

  81. Diksha Sharma and S.Bhatnagar, Optimal parameterized policies for resource allocation in communication networks, Proceedings of IEEE International Conference on Signal and Image Processing, Hubli, Karnataka, 2006.

  82. V.L.Raju Chinthalapati and S.Bhatnagar, A simultaneous deterministic perturbation actor-critic algorithm with an application to optimal mortgage refinancing, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA, 2006.

  83. S.Bhatnagar and M.S.Abdulla, An actor-critic algorithm for finite horizon Markov decision processes, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA, 2006.

  84. R.Patro and S.Bhatnagar, A four-timescale algorithm for constrained stochastic optimization of RED, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, USA, 2006.

  85. M.S.Abdulla and S.Bhatnagar, SPSA with measurement reuse, Proceedings of Winter Simulation Conference, Monterey, CA, USA, 2006.

  86. Diksha Sharma, S.Bhatnagar and S.Chakraborty, An Algorithm for Dynamic Optimal Bandwidth Allocation in Communication Networks, Proceedings of Fifth Asia Pacific International Symposium on Information Technology (APIS5), pp.489-492, Hangzhou, China, 2006.

  87. M.S.Abdulla and S.Bhatnagar, Solution of MDPs using simulation based value iteration, Proceedings of Second IFIP Conference on Artificial Intelligence Applications and Innovations, pp.749-759, Beijing, China, 2005.

  88. A.Dukkipati, M.N.Murty and S.Bhatnagar, Properties of Kullback-Leibler cross-entropy minimization in nonextensive framework, Proceedings of IEEE International Symposium on Information Theory, pp.2374-2378, Adelaide, Australia, 2005.

  89. A.Dukkipati, M.N.Murty and S.Bhatnagar, Information theoretic justification of Boltzmann selection and its generalization to Tsallis case, Proceedings of IEEE Congress on Evolutionary Computation, pp.1667-1674, Vol.2, Edinburgh, U.K., 2005.

  90. S.Bhatnagar and S.Kumar, A reinforcement learning based algorithm for Markov decision processes, Proceedings of International Conference on Intelligent Sensing and Information Processing (ICISIP), pp.199-204, Chennai, India, 2005.

  91. P.Viswanath, M.N.Murty and S.Bhatnagar, A pattern synthesis technique to reduce the curse of dimensionality effect, Proceedings of International Conference on Knowledge Based Computer Systems (KBCS), pp.219-228, Hyderabad, India, 2004.

  92. R.Vaidya and S.Bhatnagar, Correlation based optimization of random early detection, Proceedings of IEEE INDICON, pp.47-51, Kharagpur, India, 2004.

  93. R.Vaidya and S.Bhatnagar, Optimized RIO for diffserv networks, Proceedings of Information and Computer Science (ICICS), pp.227-240, Dhahran, Saudi Arabia, 2004.

  94. J.R.Panigrahi and S.Bhatnagar, Hierarchical decision making in semi-conductor fabs using multi-time scale Markov decision processes, Proceedings of IEEE Conference on Decision and Control (CDC), pp.4387-4392, Vol.4, Paradise Island, Nassau, Bahamas, 2004.

  95. P.Viswanath, M.N.Murty and S.Bhatnagar, A pattern synthesis technique with an efficient nearest neighbor classifier for binary pattern recognition, Proceedings of International Conference on Pattern Recognition (ICPR), pp.416-419, Vol.4, Cambridge, U.K., 2004.

  96. A.Dukkipati, M.N.Murty and S.Bhatnagar, Cauchy annealing schedule: an annealing schedule for Boltzmann selection scheme in evolutionary algorithms, Proceedings of IEEE Congress on Evolutionary Computation, pp.55-62, Vol.1, Portland, Oregon, USA, 2004.

  97. A.Dukkipati, M.N.Murty and S.Bhatnagar, Quotient evolutionary space: abstraction of evolutionary process w.r.t macroscopic properties, Proceedings of IEEE Congress on Evolutionary Computation, pp.846-853, Vol.2, Canberra, Australia, 2003.

  98. P.Viswanath, M.N.Murty and S.Bhatnagar, Synthetic patterns for nearest neighbour classifier design, Proceedings of Knowledge Based Computer Systems (KBCS), pp.323-332, Mumbai, India, 2002.

  99. P.Viswanath, M.N.Murty and S.Bhatnagar, An efficient classifier: using a compact tree structure and novel pattern synthesis, Proceedings of HPC Asia Conference, pp.395-398, Bangalore, India, 2002.

  100. S.Bhatnagar, E.Fernandez-Gaucherand, M.C.Fu, Y.He and S.I.Marcus, A Markov decision process model for capacity expansion and allocation, Proceedings of 38th IEEE Conference on Decision and Control, pp.1156-1161, Phoenix, Arizona, 1999.

  101. S.Bhatnagar, M.C.Fu and S.I.Marcus, Two timescale SPSA algorithms for rate-based ABR flow control, Advances in System Theory Symposium (in honour of Sanjoy K.Mitter on his 65th birthday), Cambridge, USA, October, 1999.

  102. S.Bhatnagar, M.C.Fu and S.I.Marcus, Rate based ABR flow control using two timescale SPSA, Proceedings of SPIE Conference on Performance and Control of Network Systems III, pp.142-149, Boston, September, 1999.

  103. S.Bhatnagar, M.C.Fu, S.I.Marcus, and Y.He, Markov decision processes for semiconductor fab-level decision making, Proceedings of IFAC 14th Triennial World Congress, Beijing, China, pp.145-150, 1999.

  104. S.Bhatnagar and V.Sharma, Optimal control of a feedback queue via stochastic approximation, Proceedings of IEEE Globecom98, Sydney, Australia, November, 1998.

  105. S.Bhatnagar and V.S.Borkar, Infinitesimal perturbation analysis: an overview and recent trends, Proceedings of the sixth IEEE symposium on intelligent systems, Bangalore, 1997.

  106. S.Bhatnagar and V.H.Gupta, Using stochastic control to save fuel in automobiles, Proceedings of the fifth IEEE symposium on intelligent systems, Bangalore, pp.55-61, Nov.1996.


Technical Reports


  1. S.Bhatnagar, R.S.Sutton, M.Ghavamzadeh and M.Lee, Natural Actor-Critic Algorithms, Technical Report, Department of Computing Science, University of Alberta, Canada, 2009 2009TR09-10.php.

  2. H.L.Prasad, S.Bhatnagar and N.Hemachandra, A computational procedure for general-sum stochastic games, Technical Report IISc-CSA-TR-2009-5, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2009.

  3. K.R.Vemu, S.Bhatnagar and N.Hemachandra, Link-route pricing for optimal QoS, Technical Report IISc-CSA-TR-2007-8, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2007.

  4. M.S.Abdulla and S.Bhatnagar, Reinforcement learning based algorithms for average cost Markov decision processes, Technical Report IISc-CSA-TR-2005-6, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2005.

  5. S.Bhatnagar, A simulation based algorithm for Markov decision processes, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 2002.

  6. Y.He, S.Bhatnagar, M.C.Fu and S.I.Marcus, Approximate policy iteration for semiconductor fab-level decision making - a case study, Institute for Systems Research, University of Maryland, URL: http:www.isr.umd.eduTechReports ISR2000TR _2000-49/, 2000.

  7. S.Bhatnagar, M.C.Fu, S.I.Marcus and S.Bhatnagar, Randomized difference two-timescale simultaneous perturbation stochastic approximation algorithms for simulation optimization of hidden Markov models, Institute for Systems Research, University of Maryland, URL: http:www.isr.umd.eduTechReports ISR2000 TR_2000-13/, 2000.

  8. S.Bhatnagar, M.C.Fu and S.I.Marcus, Optimal multilevel policies for ABR flow control using two timescale SPSA, Institute for Systems Research, University of Maryland, URL: http:www.isr.umd.eduTechReportsISR1999TR_99-18/, 1999.