Publications
Publications
PhD
Dissertation
Regularization in Reinforcement Learning, Department of Computing Science, University of Alberta, September 2011. (Supervisor: Csaba Szepesvári and Martin Jägersand; Examining Committee: Peter Bartlett, Michael Bowling, Alexander Melnikov, Dale Schuurmans, Richard S. Sutton) (My version, which is slightly updated; U of A’s link)
Refereed Publications
2017
‣A. M. Farahmand, Saleh Nabi, and Daniel Nikovski, “Deep Reinforcement Learning for Partial Differential Equation Control," American Control Conference (ACC), May 2017 (PDF)
2016
‣A. M. Farahmand, Saleh Nabi, Piyush Grover, and Daniel Nikovski, “Learning to Control Partial Differential Equations: Regularized Fitted Q-Iteration Approach,” IEEE Conference on Decision and Control (CDC), December 2016. (PDF) (IEEE page)
‣A. M. Farahmand, Daniel Nikovski, Yuji Igarashi, and Hiroki Konaka, “Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value,” The 30th AAAI Conference on Artificial Intelligence (AAAI), February 2016. (PDF)
‣A. M. Farahmand, Andre M.S. Barreto, and Daniel Nikovski, "Value-Aware Loss Function for Model Learning in Reinforcement Learning," The 13th European Workshop on Reinforcement Learning (EWRL), December 2016. (PDF) (Also see: AISTATS 2017)
2015
‣A. M. Farahmand, Doina Precup, André M.S. Barreto, Mohammad Ghavamzadeh, “Classification-based Approximate Policy Iteration,” IEEE Transactions on Automatic Control, Vol. 60, No. 11, 2015 (preprint PDF; IEEE Version).
‣Note: This paper has an extended version with additional discussions and experiments, which might be easier to read: Copy on arXiv or here, which is slightly more up to date.
‣De-An Huang, A. M. Farahmand, Kris M. Kitani, and J. Andrew Bagnell, “Approximate MaxEnt Inverse Optimal Control and its Application for Mental Simulation of Human Interactions,” In the Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), Jan. 2015. (PDF; Extended Version PDF)
‣De-An Huang, A. M. Farahmand, Kris M. Kitani, and J. Andrew Bagnell, “Approximate MaxEnt Inverse Optimal Control,” The 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), June 2015. (PDF)
‣J. Andrew Bagnell and A. M. Farahmand, “Learning Positive Functions in a Hilbert Space,” NIPS Workshop on Optimization for Machine Learning, December 2015. (PDF) (Presentation: Poster)
2014
‣Philip
Bachman, A.
M. Farahmand,
and Doina Precup, “Sample-based
Approximate Regularization,”
International Conference on Machine Learning (ICML),
2014. (PDF;
Extended
Version PDF;
Code
on GitHub)
‣A. M. Farahmand, Doina Precup, André M.S. Barreto, Mohammad Ghavamzadeh, “Classification-based Approximate Policy Iteration: Experiments and Extended Discussions,” 2014 (copy on arXiv or here, which is slightly more up to date).
‣Note: A shorter version of this paper is published at IEEE Transactions on Automatic Control, 2015 (preprint PDF; IEEE Version).
2013
‣Beomjoon
Kim, A.
M. Farahmand,
Joelle Pineau, and Doina Precup, “Learning
from Limited Demonstrations,”
In the Proceedings of Advances in Neural Information
Processing Systems (NIPS-26), 2013. (PDF;
Supplementary
material)
‣Mahdi
Milani Fard, Yuri Grinberg, A. M.
Farahmand, Joelle Pineau, Doina Precup, “Bellman
Error Based Feature Generation using Random Projections
on Sparse Spaces,” In the
Proceedings of Advances in Neural Information Processing
Systems (NIPS-26), 2013. (PDF;
Supplementary
material)
‣Beomjoon
Kim, A.
M. Farahmand,
Joelle Pineau, and Doina Precup, “Approximate Policy
Iteration with Demonstration Data,” The
1st Multidisciplinary Conference on
Reinforcement Learning and Decision Making, 2013. (PDF)
‣A.
M. Farahmand, Doina Precup, André M.S. Barreto,
and Mohammad Ghavamzadeh, “CAPI:
Generalized Classification-based Approximate Policy
Iteration,” The 1st Multidisciplinary Conference on
Reinforcement Learning and Decision Making (RLDM), 2013. (PDF)
2012
‣A.
M. Farahmand
and Csaba Szepesvári, “Regularized
Least-Squares Regression: Learning from a β-mixing
Sequence,”
Journal of Statistical Planning and Inference (JSPI),
Volume 142, Issue 2, February 2012. (Preprint
- PDF;
JSPI’s
version)
‣A. M. Farahmand and Doina Precup, “Value Pursuit Iteration,” In the Proceedings of Advances in Neural Information Processing Systems (NIPS-25), 2012. (PDF; Extended Version PDF)
‣A.
M. Farahmand,
Doina Precup, and Mohammad Ghavamzadeh, “Generalized
Classification-based Approximate Policy Iteration,”
Tenth European Workshop on Reinforcement Learning (EWRL
2012), Edinburgh, Scotland, June 2012. (PDF)
2011
‣A. M. Farahmand, “Action-Gap Phenomenon in Reinforcement Learning,” In the proceedings of the Advances in Neural Information Processing Systems (NIPS-24), 2011 (PDF).
‣A.
M. Farahmand
and Csaba Szepesvári, “Model
Selection in Reinforcement Learning,”
Machine Learning Journal, Vol. 85, No. 3, Springer,
2011. (PDF;
MLJ’s
version)
‣A.
M. Farahmand
and Csaba Szepesvári, “BErMin:
A Model Selection Algorithm for Reinforcement Learning
Problems,”
NIPS Workshop on New
Frontiers in Model Order Selection,
December, 2011. (PDF;
video
recording of the presentation)
[This is a four-page summary of our Machine
Learning Journal
paper]
2010
‣A.
M. Farahmand,
Remi Munos, Csaba Szepesvári, “Error
Propagation for Approximate Policy and Value Iteration,”
Advances in Neural Information Processing Systems
(NIPS-23), 2010. (PDF;
Extended
Version PDF)
‣A.
M. Farahmand,
Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas, “Interaction
of Culture-based Learning and Cooperative Co-evolution
and its Application to Automatic Behavior-based System
Design,”
IEEE Transactions
on Evolutionary Computation, Vol. 14, No. 1, pp. 23-57,
2010. (Preprint
- PDF;
IEEE’s
version)
‣Azad Shademan, A. M. Farahmand, and Martin Jägersand, “Robust Jacobian Estimation for Uncalibrated Visual Servoing,” Accepted for publication in the Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Anchorage, Alaska, USA, May 2010. (PDF)
2009
‣A.
M. Farahmand,
Azad Shademan, Martin Jägersand, and Csaba Szepesvári, “Model-based
and Model-free Reinforcement Learning for Visual
Servoing,”
In the Proceedings of the International Conference on
Robotics and Automation (ICRA), Kobe, Japan, May 2009. (PDF;
IEEE’s
version)
(Presentation: PDF)
‣A.
M. Farahmand,
Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor,
“Regularized
Fitted Q-Iteration for Planning in Continuous-Space
Markovian Decision Problems,”
In the Proceedings of the American Control Conference
(ACC), St. Louis, Missouri, USA, June 2009. (PDF;
IEEE’s
version)
‣Azad
Shademan, A.
M. Farahmand,
Martin Jägersand, “Towards
Learning Robotic Reaching and Pointing: An Uncalibrated
Visual Servoing Approach,”
Sixth
Canadian Conference on Computer and Robot Vision (CRV),
Kelowna, British Columbia, Canada, 2009.
‣A.
M. Farahmand,
Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor,
“Regularization
in Reinforcement Learning,”
Multidisciplinary Symposium on Reinforcement Learning
(MSRL-2009), Montreal, QC, Canada, June 2009. (PDF)
‣A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Robot Learning with Regularized Reinforcement Learning,” Workshop on Regression in Robotics: Approaches and Applications, Robotics: Science and Systems Conference (RSS-2009), Seattle, WA, June 2009. (PDF)
2008
‣A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Policy Iteration,” Twenty-Second Annual Conference on Advances in Neural Information Processing Systems (NIPS-2008), Vancouver, Canada, December 2008. (24% acceptance rate) (PDF)
‣A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Fitted Q-Iteration: Application to Bounded Resource Planning,” in Recent Advances in Reinforcement Learning, 8th European Workshop, EWRL 2008, Revised and Selected Papers, Springer, LNCS 5323, pp. 55—68, 2008. (PDF)
‣A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Policy Iteration,” Eighth European Workshop on Reinforcement Learning (EWRL 2008), Villeneuve d'Ascq, France, July 2008.
2007
‣A. M. Farahmand, Csaba Szepesvári, and Jean-Yves Audibert, "Manifold-Adaptive Dimension Estimation," International Conference on Machine Learning (ICML), 2007. (PDF; ACM’s version) (Presentation given at the conference: PDF, VideoLecture’s recorded presentation)
‣A. M. Farahmand, Azad Shademan, and Martin Jagersand, "Global Visual-Motor Estimation for Uncalibrated Visual Servoing," IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2007. (PDF; IEEE’s version)
‣A. M. Farahmand, Csaba Szepesvári, and Jean-Yves Audibert, "Towards Manifold-Adaptive Learning," NIPS Workshop on Topology Learning, Whistler, Canada, 2007 (PDF)
2006
‣A. M. Farahmand, Majid Nili Ahmadabadi, Caro Lucas, and Babak N. Araabi, “Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems,” In the Proceedings of IEEE Congress on Evolutionary Computation (CEC), Vancouver, Canada, 2006. (Chosen as the best presentation of the “Evolving Learning Systems” technical session) (PDF) (Presentation: PDF, PPT)
‣A. M. Farahmand and Mohammad javad Yazdanpanah, “Channel Assignment using Chaotic Simulated Annealing Enhanced Hopfield Neural Network,” In the Proceedings of International Joint Conference on Neural Networks (IJCNN), Vancouver, Canada, 2006. (PDF) (Presentation: PDF, PPT)
‣Mohammad G. Azar, Majid Nili Ahmadabadi, A. M. Farahmand, and Babak N. Araabi, “Learning to Coordinate Behaviors in Soft Behavior-based Systems using Reinforcement Learning,” International Joint Conference on Neural Networks (IJCNN), Vancouver, Canada, 2006.
2005
‣A. M. Farahmand and M. J. Yazdanpanah, “Locally Optimal Takagi-Sugeno Fuzzy Controllers,” Proceedings of the 44th IEEE Conference on Decision and Control (CDC) and the European Control Conference (ECC), pp. 4095-4099, Seville, Spain, December 2005. (PDF) (Presentation: PDF, PPT)
‣M. J. Yazdanpanah, E. Madanian, and A. M. Farahmand, “Channel Assignment in Cellular Communications using a New Modification on Hopfield Networks,” Iranian Journal of Science and Technology, Transaction B: Engineering, Vol. 29, No. B4, 2005.
‣A. M. Farahmand and Majid Nili Ahmadabadi, "The Effect of Reinforcement Signal Error in Reinforcement Learning," Computer Society of Iran Computer Conference (CSICC), 2005 (in Persian).
2004
‣A. M. Farahmand, Majid Nili Ahmadabadi, and Babak N. Araabi, "Behavior Hierarchy Learning in a Behavior-based System using Reinforcement Learning," Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2004), Sendai, Japan, 2004. (PDF) (Presentation: PDF, PPT)
‣A. M. Farahmand and Caro Lucas, "Fuzzy Neural Network Implementation of Q(λ) for Mobile Robots," WSEAS Transaction on Systems, Issue 1, Vol. 3, Jan. 2004.
2001
‣A. M. Farahmand, Roxana Akhbari, and Maryam Tajvidi, "Evolving Hidden Markov Models," 4th Iranian Student Conference on Electrical Engineering (ISCEE), 2001 (in Persian).
2000
‣A. M. Farahmand and Amir Emad Mirmirani, "Distributed Genetic Algorithms," 3rd Iranian Student Conference on Electrical Engineering (ISCEE), 2000 (in Persian).
Thesis
‣PhD Dissertation: Regularization in Reinforcement Learning, Department of Computing Science, University of Alberta, September 2011. (Supervisor: Csaba Szepesvári and Martin Jägersand – Examining Committee: Peter Bartlett, Michael Bowling, Alexander Melnikov, Dale Schuurmans, Richard S. Sutton) (My version, which is slightly updated; U of A’s link)
‣MS Thesis: Learning and Evolution in Hierarchical Behavior-based Systems, M.S. Thesis, University of Tehran, 2005 (in Persian) (PDF). (Advisors: Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas) (Examining Committee: Babak Moshiri and Alireza Fatehi)
‣MS Thesis: Calculating Resonant Frequencies of a Metallic Cavity using Finite Element Method, BSEE Thesis, K. N. Toosi University of Technology, 2002 (In Persian) (PDF). (Advisor: Mohammad-Sadegh Abrishamian) (Examining Committee: Manouchehr Kamyab and Mohsen Aboutorab)
Submitted/Working papers
‣A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Policy Iteration for Nonparametric Function Spaces” Submitted to JMLR, 2013 January.
‣A. M. Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, and Shie Mannor, “Regularized Fitted Q-Iteration Algorithm,” Working paper.
‣A. M. Farahmand, Csaba Szepesvári, and Jean-Yves Audibert, “Nearest Neighborhood Methods for Manifold-Adaptive Dimension Estimation and Regression,” Working paper. [We never end up submitting this anywhere, even though it was almost ready to go. We may put it as TR somewhere.]
Technical Reports (Selected)
‣A. M. Farahmand, Majid Nili Ahmadabadi, and Babak N. Araabi, "Behavior and Hierarchy Development in Behavior-based Systems using Reinforcement Learning,” Technical Report, 2005.
‣A. M. Farahmand, Caro Lucas, and Babak N. Araabi, "Chaos Control Survey," a Technical Report for my Seminar Course, University of Tehran, 2004 (in Persian/Farsi). (PDF)
‣A. M. Farahmand and Mohammad javad Yazdanpanah, "A Class of Nonlinear Controllers for Synchronization of Chaotic Semipassive Systems," Technical Report, University of Tehran, 2003.
‣A. M. Farahmand, Ramin Pashai, and Ezatollah Geranpayeh, "Effect of Metallic Electrode and Buffer Layer on Dielectric Waveguides," Technical Report of my internship period at Iran Telecommunication Research Center (ITRC), 2001.
‣A. M. Farahmand, "On Chaotic Models of Population - A Survey," 1999 (In Persian/Farsi).
‣A. M. Farahmand, "Data Compression Methods," (This my first technical report. I wrote it when I was still in high school. Not of particular technical value, but still an achievement at that time), 1997 (In Persian/Farsi)