Algorithm
Authors
Publication
Code
Classification
Features
Detailed
Markov Decision Processes (MDP )
Puterman, M.L
John Wiley & Sons 2014
/
Model
/
TBD
Temporal Difference (TD ) Learning
Tesauro, G.
Communications of the ACM 1995
/
Cornerstone
TBD
TBD
Q-Learning
Watkins, C. J. et al.
Machine Learning 1992
/
/
Q Table
TBD
Deep Q-Networks (DQN )
Mnih, V. et al.
Nature 2015
PyTorch
Q Networks
Deep network + Q-learning
TBD
Deep Deterministic Policy Gradient (DDPG )
Lillicrap, T.P. et al.
arXiv 2015
TBD
AC
Continuous control
TBD
Trust Region Policy Optimization (TRPO )
Schulman, J. et al.
ICML 2015
TBD
Policy
TBD
TBD
Prioritized Experience Replay (PER )
Schaul, T. et al.
arXiv 2015
TBD
Replay
TBD
TBD
Deep Recurrent Q-Network (DRQN )
Hausknecht, M. et al.
AAAI 2015
TBD
Q Networks
TBD
TBD
Monte-Carlo Tree Search (MCTS )
Silver, D. et al.
Nature 2016
TBD
TBD
TBD
TBD
Double DQN
Van Hasselt, H. et al.
AAAI 2016
TBD
Q Networks
TBD
TBD
Dueling DQN
Wang, Z. et al.
ICML 2016
TBD
Q Networks
TBD
TBD
Asynchronous Advantage Actor-Critic (A3C )
Mnih, V. et al.
ICML 2016
TBD
AC
TBD
TBD
Noise Networks
Fortunato, M. et al.
arXiv 2017
TBD
Exploration
TBD
TBD
Hindsight Experience Replay (HER )
Andrychowicz, M. et al.
NeurIPS 2017
TBD
Replay
TBD
TBD
Soft Q-Learning (SQL )
Haarnoja, T. et al.
ICML 2017
TBD
TBD
TBD
TBD
Distributional DQN
Bellemare, M.G. et al.
ICML 2017
TBD
Q Networks
TBD
TBD
Proximal Policy Optimization (PPO )
Schulman, J. et al.
arXiv 2017
TBD
Policy
TBD
TBD
Multi-Agent DDPG (MADDPG )
Lowe, R. et al.
NeurIPS 2017
TBD
MADRL
TBD
TBD
FeUdal Networks
Vezhnevets, A.S. et al.
ICML 2017
TBD
HRL
TBD
TBD
Twin Delayed DDPG (TD3 )
Fujimoto, S. et al.
ICML 2018
TBD
AC
TBD
TBD
Soft Actor-Critic (SAC )
Haarnoja, T. et al.
ICML 2019
TBD
AC
TBD
TBD
-----------
-----------
-----------
TBD
TBD
TBD
TBD
You can’t perform that action at this time.