Textbook

I started to write a textbook on reinforcement learning when I taught my Introduction to Reinforcement Learning course in Spring 2021. The current version posted here is from 2021. I intend to release an update in 2025.

A.M. Farahmand, Lecture Notes on Reinforcement Learning, 2021.

The textbook is introductory in the sense that it does not assume prior exposure to reinforcement learning. It is not, however, focused on being a collection of algorithms or only providing high-level intuition. Instead, it tries to build the mathematical intuition behind many important ideas and concepts often encountered in RL. We prove many basic, or sometimes not so basic, results in RL. If the proof of some result is too complicated, we prove a simplified version of it.

If you are a university instructor and wish to use slides for your own course, please contact me.

Papers

This list is regularly updated. You can also check my Google Scholar page.

2025

Jongmin Lee, Amin Rakhsha, Ernest K. Ryu, and A.M. Farahmand, “Deflated Dynamics Value Iteration,” Transactions on Machine Learning Research (TMLR), 2025. (PDF; OpenReview; Code)
Mete Kemertas, A.M. Farahmand, and Allan Jepson, “A Truncated Newton Method for Optimal Transport,” International Conference on Learning Representations (ICLR), 2025. (PDF; OpenReview)
Claas Voelcker, Marcel Hussing, Eric Eaton, A.M. Farahmand, and Igor Gilitschenski, “MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL,” International Conference on Learning Representations (ICLR), 2025. (PDF; OpenReview)

2024

Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, and A.M. Farahmand, “Maximum Entropy Model Correction in Reinforcement Learning,” International Conference on Learning Representations (ICLR), 2024. (PDF; arXiv; OpenReview)
Mark Bedaywi, Amin Rakhsha, A.M. Farahmand, “PID Accelerated Temporal Difference Algorithms,” Reinforcement Learning Conference (RLC), 2024. (PDF; arXiv)
Claas Voelcker, Tyler Kastner, Igor Gilitschenski, A.M. Farahmand, “When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning,” Reinforcement Learning Conference (RLC), 2024. (PDF; arXiv)
Marcel Hussing, Claas Voelcker, Igor Gilitschenski, and A.M. Farahmand, Eric Eaton “Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence,” Reinforcement Learning Conference (RLC), 2024. (PDF; arXiv)
Avery Ma, A.M. Farahmand, Yangchen Pan, Philip Torr, Jindong Gu, “Improving Adversarial Transferability via Model Alignment,” European Conference on Computer Vision (ECCV), 2024. (PDF; arXiv; Code)

2023

Tyler Kastner, Murat A. Erdogdu, and A.M. Farahmand, “Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning,” Neural Information Processing Systems (NeurIPS), 2023. (PDF; OpenReview)
Avery Ma, Yangchen Pan, and A.M. Farahmand, “Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods,” Transactions on Machine Learning Research (TMLR), 2023. (PDF; OpenReview) [Featured certificate: ~3.5% of accepted papers]
Claas A. Voelcker, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, and A.M. Farahmand, “λ-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces,” 2023. (arXiv)
Mete Kemertas, Allan Jepson, and A.M. Farahmand, “Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients”, 2023. (arXiv)

2022

Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, and A.M. Farahmand, “Operator Splitting Value Iteration,” Neural Information Processing Systems (NeurIPS), 2022. (PDF; OpenReview)
Claas A. Voelcker, Victor Liao, Animesh Garg, and A.M. Farahmand, “Value Gradient Weighted Model-Based Reinforcement Learning,” International Conference on Learning Representations (ICLR), 2022. (PDF; OpenReview)
Guiliang Liu, Ashutosh Adhikari, A.M. Farahmand, and Pascal Poupart, “Learning Object-Oriented Dynamics for Planning from Text,” International Conference on Learning Representations (ICLR), 2022. (PDF; OpenReview; GitHub)
Jincheng Mei, Yangchen Pan, A.M. Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani and, Jun Luo, “Understanding and Mitigating the Limitations of Prioritized Replay,” Conference on Uncertainty in Artificial Intelligence (UAI), 2022. (PDF)

2021

A.M. Farahmand and Mohammad Ghavamzadeh, “PID Accelerated Value Iteration Algorithm,” International Conference on Machine Learning (ICML), 2021. (PDF; Extended Version PDF)
Erfan Pirmorad, Faraz Khoshbakhtian, Farnam Mansouri, A.M. Farahmand, “Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations,” NeurIPS Workshop on the Symbiosis of Deep Learning and Differential Equations, 2021. (PDF; OpenReview)

2020

Yangchen Pan, Jincheng Mei, A.M. Farahmand, “Frequency-based Search-Control in Dyna,” International Conference on Learning Representations (ICLR), 2020. (PDF)
Yangchen Pan, Ehsan Imani, A.M. Farahmand, Martha White, “An Implicit Function Learning Approach for Regression,” Neural Information Processing Systems (NeurIPS), 2020. (PDF on arXiv)
Romina Abachi, Mohammad Ghavamzadem, A.M. Farahmand, “Policy-Aware Model Learning for Policy-Gradient Methods,” 2020. (arXiv)
Avery Ma, Fartash Faghri, Nicolas Papernot, A.M. Farahmand, “SOAR: Second-Order Adversarial Regularization,” 2020. (arXiv)
Rodrigo Toro Icarte, Richard Valenzano, Toryn Klassen, Phillip Christoffersen, A.M. Farahmand, Sheila McIlraith, “The Act of Remembering: A Study in Partially Observable Reinforcement Learning,” 2020. (arXiv)

2019

A.M. Farahmand, “Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm,” In the Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019. (PDF; Extended Version PDF)
Mohamed Akrout, A.M. Farahmand, Tory Jarmain, Latif Abid, “Improving Skin Condition Classification with a Visual Symptom Checker Trained using Reinforcement Learning,” International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019. (arXiv)
Yangchen Pan, Hengshuai Yao, A.M. Farahmand, and Martha White, “Hill Climbing on Value Estimates for Search-control in Dyna,” International Joint Conference on Artificial Intelligence (IJCAI), 2019. (PDF)
Marc T. Law, Jake Snell, A.M. Farahmand, Raquel Urtasun, and Richard S. Zemel, “Dimensionality Reduction for Representing the Knowledge of Probabilistic Models,” International Conference on Learning Representations (ICLR), 2019. (PDF) (OpenReview)
Mouhacine Benosman, A.M. Farahmand, Meng Xia, “Learning-Based Iterative Modular Adaptive Control for Nonlinear Systems,” International Journal of Adaptive Control and Signal Processing, Vol. 33, No. 2, pp. 335–355, 2019. (Publisher’s Version)

2018

A.M. Farahmand, “Iterative Value-Aware Model Learning,” Neural Information Processing Systems (NeurIPS), 2018. (PDF; Extended Version PDF)
Yangchen Pan, A.M. Farahmand, Martha White, Saleh Nabi, Piyush Grover, and Daniel Nikovski, “Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control,” International Conference on Machine Learning (ICML), 2018. (PDF; Extended Version PDF)

2017

A. M. Farahmand, Sepideh Pourazarm, and Daniel Nikovski, “Random Projection Filter Bank for Time Series Data,” In the Proceedings of Advances in Neural Information Processing Systems (NeurIPS), December 2017. (PDF; Extended Version PDF; NeurIPS website)
A. M. Farahmand, André M.S. Barreto, and Daniel Nikovski, “Value-Aware Loss Function for Model-based Reinforcement Learning,“ The 20th International Conference on Artificial Intelligence and Statistics (AISTATS), April 2017. (PDF; Extended Version PDF) Note: An extended abstract version of this paper appeared at EWRL 2016.
A. M. Farahmand, Saleh Nabi, and Daniel Nikovski, “Deep Reinforcement Learning for Partial Differential Equation Control,” American Control Conference (ACC), May 2017 (PDF)
Sepideh Pourazarm, A. M. Farahmand, and Daniel Nikovski, “Fault Detection and Prognosis of Time Series Data with Random Projection Filter Bank,” In the Proceedings of Annual Conference of Prognostics and Health Management Society (PHM), October 2017. (PDF; PHM Version) [This is a mostly empirical version of the Random Projection Filter Bank work geared towards fault detection and prognosis applications. See the NeurIPS paper for more theoretical version (with proof/without proofs).]

2016

A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor, “Regularized Policy Iteration with Nonparametric Function Spaces,” Journal of Machine Learning Research (JMLR), Vol. 17, No. 139, 2016. (PDF) (JMLR page)
A. M. Farahmand, Saleh Nabi, Piyush Grover, and Daniel Nikovski, “Learning to Control Partial Differential Equations: Regularized Fitted Q-Iteration Approach,” IEEE Conference on Decision and Control (CDC), December 2016. (PDF) (IEEE page)
A. M. Farahmand, Daniel Nikovski, Yuji Igarashi, and Hiroki Konaka, “Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value,” The 30th AAAI Conference on Artificial Intelligence (AAAI), February 2016. (PDF)
Mouhacine Benosman, A. M. Farahmand, and Meng Xia, “Learning-based Modular Indirect Adaptive Control for a Class of Nonlinear Systems,” American Control Conference (ACC), 2016. (PDF) (IEEE page)
A. M. Farahmand, Andre M.S. Barreto, and Daniel Nikovski, “Value-Aware Loss Function for Model Learning in Reinforcement Learning,” The 13th European Workshop on Reinforcement Learning (EWRL), December 2016. (PDF) (Also see: AISTATS 2017)

2015

A. M. Farahmand, Doina Precup, André M.S. Barreto, Mohammad Ghavamzadeh, “Classification-based Approximate Policy Iteration,” IEEE Transactions on Automatic Control, Vol. 60, No. 11, 2015 (preprint PDF; IEEE Version).
- Note: This paper has an extended version with additional discussions and experiments, which might be easier to read: Copy on arXiv or here, which is slightly more up to date.
De-An Huang, A. M. Farahmand, Kris M. Kitani, and J. Andrew Bagnell, “Approximate MaxEnt Inverse Optimal Control and its Application for Mental Simulation of Human Interactions,” In the Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), Jan. 2015. (PDF; Extended Version PDF)
- De-An Huang, A. M. Farahmand, Kris M. Kitani, and J. Andrew Bagnell, “Approximate MaxEnt Inverse Optimal Control,” The 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), June 2015. (PDF)
J. Andrew Bagnell and A. M. Farahmand, “Learning Positive Functions in a Hilbert Space,” NIPS Workshop on Optimization for Machine Learning, December 2015. (PDF) (Presentation: Poster) [Update in 2022: We considered only a subset of Kernel SoS functions in our representer theorem.]

2014

Philip Bachman, A. M. Farahmand, and Doina Precup, “Sample-based Approximate Regularization,” International Conference on Machine Learning (ICML), 2014. (PDF; Extended Version PDF; Code on GitHub)
A. M. Farahmand, Doina Precup, André M.S. Barreto, Mohammad Ghavamzadeh, “Classification-based Approximate Policy Iteration: Experiments and Extended Discussions,” 2014 (copy on arXiv or here, which is slightly more up to date).
- Note: A shorter version of this paper is published at IEEE Transactions on Automatic Control, 2015 (preprint PDF; IEEE Version).

2013

Beomjoon Kim, A. M. Farahmand, Joelle Pineau, and Doina Precup, “Learning from Limited Demonstrations,” In the Proceedings of Advances in Neural Information Processing Systems (NeurIPS-26), 2013. (PDF;Supplementary material)
Mahdi Milani Fard, Yuri Grinberg, A. M. Farahmand, Joelle Pineau, Doina Precup, “Bellman Error Based Feature Generation using Random Projections on Sparse Spaces,” In the Proceedings of Advances in Neural Information Processing Systems (NeurIPS-26), 2013. (PDF; Supplementary material)
Beomjoon Kim, A. M. Farahmand, Joelle Pineau, and Doina Precup, “Approximate Policy Iteration with Demonstration Data,” The 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013. (PDF)
A. M. Farahmand, Doina Precup, André M.S. Barreto, and Mohammad Ghavamzadeh, “CAPI: Generalized Classification-based Approximate Policy Iteration,” The 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2013. (PDF)

2012

A. M. Farahmand and Csaba Szepesvári, “Regularized Least-Squares Regression: Learning from a β-mixing Sequence,” Journal of Statistical Planning and Inference (JSPI), Volume 142, Issue 2, February 2012. (Preprint – PDF; JSPI’s version)
A. M. Farahmand and Doina Precup, “Value Pursuit Iteration,” In the Proceedings of Advances in Neural Information Processing Systems (NeurIPS-25), 2012. (PDF; Extended Version PDF)
A. M. Farahmand, Doina Precup, and Mohammad Ghavamzadeh, “Generalized Classification-based Approximate Policy Iteration,” Tenth European Workshop on Reinforcement Learning (EWRL 2012), Edinburgh, Scotland, June 2012. (PDF)

2011

A. M. Farahmand, “Action-Gap Phenomenon in Reinforcement Learning,” In the proceedings of the Advances in Neural Information Processing Systems (NeurIPS-24), 2011 (PDF).
A. M. Farahmand and Csaba Szepesvári, “Model Selection in Reinforcement Learning,” Machine Learning Journal, Vol. 85, No. 3, Springer, 2011. (PDF; MLJ’s version)
A. M. Farahmand and Csaba Szepesvári, “BErMin: A Model Selection Algorithm for Reinforcement Learning Problems,” NIPS Workshop on New Frontiers in Model Order Selection, December, 2011. (PDF; video recording of the presentation) [This is a four-page summary of our Machine Learning Journal paper]

2010

A. M. Farahmand, Remi Munos, Csaba Szepesvári, “Error Propagation for Approximate Policy and Value Iteration,” Advances in Neural Information Processing Systems (NeurIPS-23), 2010. (PDF; Extended Version PDF) [Note: The version on the NeurIPS website has a minor mistake in the definition of concentrability coefficient for the AVI case. This was corrected about a year later. Thanks to Bruno Scherrer for pointing this out.]
A. M. Farahmand, Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas, “Interaction of Culture-based Learning and Cooperative Co-evolution and its Application to Automatic Behavior-based System Design,” IEEE Transactions on Evolutionary Computation, Vol. 14, No. 1, pp. 23-57, 2010. (Preprint – PDF; IEEE’s version)
Azad Shademan, A. M. Farahmand, and Martin Jägersand, “Robust Jacobian Estimation for Uncalibrated Visual Servoing,”In the Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Anchorage, Alaska, USA, May 2010. (PDF)

2009

A. M. Farahmand, Azad Shademan, Martin Jägersand, and Csaba Szepesvári, “Model-based and Model-free Reinforcement Learning for Visual Servoing,” In the Proceedings of the International Conference on Robotics and Automation (ICRA), Kobe, Japan, May 2009. (PDF; IEEE’s version) (Presentation: PDF)
A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Fitted Q-Iteration for Planning in Continuous-Space Markovian Decision Problems,” In the Proceedings of the American Control Conference (ACC), St. Louis, Missouri, USA, June 2009. (PDF;IEEE’s version)
Azad Shademan, A. M. Farahmand, Martin Jägersand, “Towards Learning Robotic Reaching and Pointing: An Uncalibrated Visual Servoing Approach,” Sixth Canadian Conference on Computer and Robot Vision (CRV), Kelowna, British Columbia, Canada, 2009.
A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularization in Reinforcement Learning,” Multidisciplinary Symposium on Reinforcement Learning (MSRL-2009), Montreal, QC, Canada, June 2009. (PDF)
A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Robot Learning with Regularized Reinforcement Learning,” Workshop on Regression in Robotics: Approaches and Applications, Robotics: Science and Systems Conference (RSS-2009), Seattle, WA, June 2009. (PDF)

2008

A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Policy Iteration,” Twenty-Second Annual Conference on Advances in Neural Information Processing Systems (NeurIPS-2008), Vancouver, Canada, December 2008. (24% acceptance rate) (PDF)
A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Fitted Q-Iteration: Application to Bounded Resource Planning,” in Recent Advances in Reinforcement Learning, 8th European Workshop, EWRL 2008, Revised and Selected Papers, Springer, LNCS 5323, pp. 55—68, 2008. (PDF)
A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Policy Iteration,” Eighth European Workshop on Reinforcement Learning (EWRL 2008), Villeneuve d’Ascq, France, July 2008.

2007

A. M. Farahmand, Csaba Szepesvári, and Jean-Yves Audibert, “Manifold-Adaptive Dimension Estimation,” International Conference on Machine Learning (ICML), 2007. (PDF; ACM’s version) (Presentation given at the conference: PDF, VideoLecture’s recorded presentation)
A. M. Farahmand, Azad Shademan, and Martin Jagersand, “Global Visual-Motor Estimation for Uncalibrated Visual Servoing,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2007. (PDF; IEEE’s version)
A. M. Farahmand, Csaba Szepesvári, and Jean-Yves Audibert, “Towards Manifold-Adaptive Learning,” NIPS Workshop on Topology Learning, Whistler, Canada, 2007 (PDF)

2006

A. M. Farahmand, Majid Nili Ahmadabadi, Caro Lucas, and Babak N. Araabi, “Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems,” In the Proceedings of IEEE Congress on Evolutionary Computation (CEC), Vancouver, Canada, 2006. (Chosen as the best presentation of the “Evolving Learning Systems” technical session) (PDF) (Presentation: PDF, PPT)
A. M. Farahmand and Mohammad javad Yazdanpanah, “Channel Assignment using Chaotic Simulated Annealing Enhanced Hopfield Neural Network,” In the Proceedings of International Joint Conference on Neural Networks (IJCNN), Vancouver, Canada, 2006. (PDF) (Presentation: PDF, PPT)
Mohammad G. Azar, Majid Nili Ahmadabadi, A. M. Farahmand, and Babak N. Araabi, “Learning to Coordinate Behaviors in Soft Behavior-based Systems using Reinforcement Learning,” International Joint Conference on Neural Networks (IJCNN), Vancouver, Canada, 2006.

2005

A. M. Farahmand and M. J. Yazdanpanah, “Locally Optimal Takagi-Sugeno Fuzzy Controllers,” Proceedings of the 44th IEEE Conference on Decision and Control (CDC) and the European Control Conference (ECC), pp. 4095-4099, Seville, Spain, December 2005. (PDF) (Presentation: PDF, PPT)
M. J. Yazdanpanah, E. Madanian, and A. M. Farahmand, “Channel Assignment in Cellular Communications using a New Modification on Hopfield Networks,” Iranian Journal of Science and Technology, Transaction B: Engineering, Vol. 29, No. B4, 2005.
A. M. Farahmand and Majid Nili Ahmadabadi, “The Effect of Reinforcement Signal Error in Reinforcement Learning,” Computer Society of Iran Computer Conference (CSICC), 2005 (in Persian).

2004

A. M. Farahmand, Majid Nili Ahmadabadi, and Babak N. Araabi, “Behavior Hierarchy Learning in a Behavior-based System using Reinforcement Learning,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2004), Sendai, Japan, 2004. (PDF) (Presentation: PDF,PPT)
A. M. Farahmand and Caro Lucas, “Fuzzy Neural Network Implementation of Q(λ) for Mobile Robots,” WSEAS Transaction on Systems, Issue 1, Vol. 3, Jan. 2004. (PDF)

2001

A. M. Farahmand, Roxana Akhbari, and Maryam Tajvidi, “Evolving Hidden Markov Models,” 4th Iranian Student Conference on Electrical Engineering (ISCEE), 2001 (in Persian). (PDF)

2000

A. M. Farahmand and Amir Emad Mirmirani, “Distributed Genetic Algorithms,” 3rd Iranian Student Conference on Electrical Engineering (ISCEE), 2000 (in Persian).

Theses and Dissertation

PhD Dissertation: Regularization in Reinforcement Learning, Department of Computing Science, University of Alberta, September 2011. (Supervisor: Csaba Szepesvári and Martin Jägersand – Examining Committee: Peter Bartlett, Michael Bowling, Alexander Melnikov, Dale Schuurmans, Richard S. Sutton) (My version, which is slightly updated; U of A’s link)
MS Thesis: Learning and Evolution in Hierarchical Behavior-based Systems, M.S. Thesis, University of Tehran, 2005 (in Persian) (PDF). (Advisors: Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas) (Examining Committee: Babak Moshiri and Alireza Fatehi)
BS Thesis: Calculating Resonant Frequencies of a Metallic Cavity using Finite Element Method, BSEE Thesis, K. N. Toosi University of Technology, 2002 (In Persian) (PDF). (Advisor: Mohammad-Sadegh Abrishamian) (Examining Committee: Manouchehr Kamyab and Mohsen Aboutorab)

Old Technical Reports (Selected)

A. M. Farahmand, Majid Nili Ahmadabadi, and Babak N. Araabi, “Behavior and Hierarchy Development in Behavior-based Systems using Reinforcement Learning,” Technical Report, 2005.
A. M. Farahmand, Caro Lucas, and Babak N. Araabi, “Chaos Control Survey,” a Technical Report for my Seminar Course, University of Tehran, 2004 (in Persian). (PDF)
A. M. Farahmand and Mohammad javad Yazdanpanah, “A Class of Nonlinear Controllers for Synchronization of Chaotic Semipassive Systems,” Technical Report, University of Tehran, 2003. (PDF)
A. M. Farahmand, Ramin Pashai, and Ezatollah Geranpayeh, “Effect of Metallic Electrode and Buffer Layer on Dielectric Waveguides,” Technical Report of my internship period at Iran Telecommunication Research Center (ITRC), 2001.
A. M. Farahmand, “On Chaotic Models of Population – A Survey,” 1999 (In Persian).
A. M. Farahmand, “Data Compression Methods,” (This my first technical report. I wrote it when I was still in high school. Not of particular technical value, but still an achievement at that time), 1997 (In Persian)