Introduction
Agenda
Transition
- Value Function stands on MDP/POMDP
- Model Based: Dynamic Programming by Richard Bellman in 1950s.
- Model Free: TD Learning(Temporal Difference Learning) by Sutton et al., 1988.
- SARSA/Qlearning introduced by Sutton and Watkins in 1992.
- Multi-step Bootstrapping: Introduction of $\lambda$ in TD/Q learning.
- Evolution from Tabular System to large continuous space of environment, which is Function Approximation.
- Linear Function Approximation EX: SARSA with Linear Function Approximation
- Neural Function Approximation with RL by Bertsekas et al., 1996