LoginSignup
49
60

More than 5 years have passed since last update.

強化学習の系図

Last updated at Posted at 2018-10-23

強化学習を一旦掘り下げられるよう整理したかったので,

  • いつ
  • だれが
  • (どんな問題を解いたアルゴリズムで)
  • 何の略称で
  • 親は誰なのか

系図を作ってまとめてみました.(取り違えているかもしれません)

image.png
図: 主要な強化学習アルゴリズムの系図(左上の数字は誕生年)

TD(λ)(Sutton, 1984;1988)

  • Temporal Differences
  • Sutton, Richard S. "Learning to predict by the methods of temporal differences." Machine learning 3.1 (1988): 9-44.

Q学習(Watkins, 1989)

  • Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.

REINFORCE(Williams, 1992)

  • REward Increment = Nonnegative Factor Offset Reinforcement
  • Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.

SARSA(Rummery, 1994)

CEM(Rubenstein, 1997)

AL(Baird Ⅲ, 1999)

  • Advantage Learning
  • Baird III, Leemon C. Reinforcement learning through gradient descent. No. CMU-CS-99-132. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE, 1999.

AC(Sutton et al., 2000)

  • Actor-Critic
  • Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with function approximation." Advances in neural information processing systems. 2000.

NFQ(Riedmiller, 2005)

  • Neural Fitted Q Iteration
  • Riedmiller, Martin. "Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method." European Conference on Machine Learning. Springer, Berlin, Heidelberg, 2005.

REPS(Peters et al., 2010)

  • Relative Entropy Policy Search
  • Peters, Jan, Katharina Mülling, and Yasemin Altun. "Relative Entropy Policy Search." AAAI. 2010.

DQN(Mnih et al., 2013)

DPG(Silver et al., 2014)

  • Deterministic Policy Gradient
  • Silver, David, et al. "Deterministic policy gradient algorithms." ICML. 2014.

DDQN(Hasselt et al., 2015)

Dueling DQN(Wang et al., 2015)

  • Double DQNの進化版
  • Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." arXiv preprint arXiv:1511.06581 (2015). image.png

PAL(Bellemare et al., 2015)

  • Persistent Average Learning
  • Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." arXiv preprint arXiv:1511.06581 (2015).

TRPO(Schulman et al., 2015)

  • Trust Region Policy Optimization
  • Schulman, John, et al. "Trust region policy optimization." International Conference on Machine Learning. 2015.

DDPG(Lillicrap et al., 2015)

  • Deep DPG
  • Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

PER(Schual et al., 2015)

Gorila(Nair et al., 2016)

  • General Reinforcement Learning Architecture
  • Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. 2016.

image.png

NAF(Gu et al., 2016)

  • Normalized Advantage Function
  • 行動空間が連続な問題でも解けるようDouble DQNを拡張
  • Gu, Shixiang, et al. "Continuous deep q-learning with model-based acceleration." International Conference on Machine Learning. 2016.

A3C(Mnih et al., 2016)

A2C(Mnih et al., 2016)

UNREAL(Jarderberg et al., 2016)

  • UNsupervised REinforcement and Auxiliary Learning
  • Jaderberg, Max, et al. "Reinforcement learning with unsupervised auxiliary tasks." arXiv preprint arXiv:1611.05397 (2016).

image.png

GAIL(Ho et al., 2016)

  • Generative Adversarial Imitation Learning
  • Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in Neural Information Processing Systems. 2016.

image.png

ACER(Wang et al., 2016)

  • Actor-Critic with Experience Replay
  • Wang, Ziyu, et al. "Sample efficient actor-critic with experience replay." arXiv preprint arXiv:1611.01224 (2016).

image.png

HER(Andrychowicz et al., 2017)

image.png

C51(Bellemare et al., 2017)

  • Rainbowの特殊系
  • A Distributional Perspective on Reinforcement Learning
  • Bellemare, Marc G., Will Dabney, and Rémi Munos. "A distributional perspective on reinforcement learning." arXiv preprint arXiv:1707.06887 (2017).

image.png

Rainbow (Hessel et al., 2017)

figure1

table1

figure2

table2

figure3

table4

PPO(Schulman et al., 2017)

image.png
image.png
Song et al., 2018より

ACKTR(Wu et al., 2017)

  • Actor Critic using Kronecker-Factored Trust Region
  • Wu, Yuhuai, et al. "Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation." Advances in neural information processing systems. 2017.

image.png

SDQN(Metz et al., 2017)

  • Sequential DQN
  • Metz, Luke, et al. "Discrete sequential prediction of continuous actions for deep RL." arXiv preprint arXiv:1705.05035 (2017).

image.png
image.png

QR-DQN(Dabney et al., 2017)

  • Quantile Regression DQN
  • Dabney, Will, et al. "Distributional reinforcement learning with quantile regression." arXiv preprint arXiv:1710.10044 (2017).

IQN(Dabney et al., 2018)

  • Implicit Quantile Network
  • Dabney, Will, et al. "Implicit Quantile Networks for Distributional Reinforcement Learning." arXiv preprint arXiv:1806.06923 (2018).

image.png
image.png
image.png
image.png

Ape-X(Horgan et al. 2018)

image.png

image.png

References

49
60
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
49
60