Paper Detail
Autor: Gerald Tesauro
Published Year: 1998
Link: https://courses.cs.washington.edu/courses/cse590hk/01sp/Readings/tesauro95cacm.pdf
What is TD-Gammon?
This is the basic algorithm combining neural network (Multi-layer perceptron) with TD-Learning. And this approach has outperformed the human champion.
The model used one hidden layer and TD($\lambda$) algorithm.
What is his claim?
- Delayed Reward
- Limits on the scale of the learning algorithm(either look-up tables or linear function approximation)
What is the context of this research?
- Development of SL algorithms(Decision Tree, SVM and so on)
- Invention of TD-Learning
Because of these two developments, this TD-Gammon algorithm could surpass the issues beforementioned.
And He looked back his previous approach named NeuroGammon and has adapted the handcrafted feature engineering technique to this algorithm.
Model
Input: raw feature set of Backgammon
Output: probability of the each four potential actions
Any further discussion?
Why did TD-gammon work?
https://pdfs.semanticscholar.org/c1ec/5116d71176aaca80b1df944c01e82cc35212.pdf
According to this paper, the authors were able to replicate the algorithm using a neural network and hill climbing optimisation approach. Indeed, they didn't use TD-learning or even reinforcement learning approach at all.
So they claimed that the success introduced by Tesauro's TD-Gammon had to do with more stochasticity in the game itself, since the way to play the game is that each player rolls the dice and place the stone in turn. So the dynamism could actually help a lot for the agent to learn.