More than 5 years have passed since last update.

[Review] Temporal Difference Learning and TD-Gammon

Last updated at 2018-07-16Posted at 2018-07-16

Paper Detail

Autor: Gerald Tesauro
Published Year: 1998
Link: https://courses.cs.washington.edu/courses/cse590hk/01sp/Readings/tesauro95cacm.pdf

What is TD-Gammon?

This is the basic algorithm combining neural network (Multi-layer perceptron) with TD-Learning. And this approach has outperformed the human champion.
The model used one hidden layer and TD($\lambda$) algorithm.

What is his claim?

Delayed Reward
Limits on the scale of the learning algorithm(either look-up tables or linear function approximation)

What is the context of this research?

Development of SL algorithms(Decision Tree, SVM and so on)
Invention of TD-Learning
Because of these two developments, this TD-Gammon algorithm could surpass the issues beforementioned.
And He looked back his previous approach named NeuroGammon and has adapted the handcrafted feature engineering technique to this algorithm.

Model

Input: raw feature set of Backgammon Output: probability of the each four potential actions

Any further discussion?

Why did TD-gammon work?
https://pdfs.semanticscholar.org/c1ec/5116d71176aaca80b1df944c01e82cc35212.pdf

According to this paper, the authors were able to replicate the algorithm using a neural network and hill climbing optimisation approach. Indeed, they didn't use TD-learning or even reinforcement learning approach at all.
So they claimed that the success introduced by Tesauro's TD-Gammon had to do with more stochasticity in the game itself, since the way to play the game is that each player rolls the dice and place the stone in turn. So the dynamism could actually help a lot for the agent to learn.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up