[Review] Transition of DRL #DRL

Introduction

Value Function stands on MDP/POMDP
Model Based: Dynamic Programming by Richard Bellman in 1950s.
Model Free: TD Learning(Temporal Difference Learning) by Sutton et al., 1988.
SARSA/Qlearning introduced by Sutton and Watkins in 1992.
Multi-step Bootstrapping: Introduction of $\lambda$ in TD/Q learning.
Evolution from Tabular System to large continuous space of environment, which is Function Approximation.
Linear Function Approximation EX: SARSA with Linear Function Approximation
Neural Function Approximation with RL by Bertsekas et al., 1996