LoginSignup
0
0

More than 5 years have passed since last update.

[Review] Transition of DRL

Posted at

Introduction

Agenda

Transition

  1. Value Function stands on MDP/POMDP
  2. Model Based: Dynamic Programming by Richard Bellman in 1950s.
  3. Model Free: TD Learning(Temporal Difference Learning) by Sutton et al., 1988.
  4. SARSA/Qlearning introduced by Sutton and Watkins in 1992. Screen Shot 2018-07-08 at 14.27.25.png
  5. Multi-step Bootstrapping: Introduction of $\lambda$ in TD/Q learning.
  6. Evolution from Tabular System to large continuous space of environment, which is Function Approximation.
  7. Linear Function Approximation EX: SARSA with Linear Function Approximation
  8. Neural Function Approximation with RL by Bertsekas et al., 1996
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0