More than 5 years have passed since last update.

[Review] Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Last updated at 2018-06-11Posted at 2018-06-11

Detail of the paper

Title: Neural Fitted Q Iteration - First Experiences
with a Data Efficient Neural Reinforcement
Learning Method
Published Year: 2005
author: Martin Riedmiller
Link: http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf

Introduction

This paper introduced the novel approach which combined the classical Q-learning and multi-layer perceptron. And this is one of the origin of the further development in the domain of Deep Reinforcement Learning.
Basically this is a function approximation using multi-layer perceptron to efficiently represent Q function.
And the author firstly clarified the advantages and disadvantages arising in this approach reported by other existing research conducted by [Tes92, Lin92, Rie00].

Advantages: Generalisation: since MLP can affect effectively globally, the model can represent the Q function efficiently compared to other local-oriented approach
Disadvantage: Danger of the divergence in Learning: to widely cover the observations to train the model globally, this is not be assured the convergence in learning.

Basic Concept

The basic idea underlying NFQ is the following: Instead of updating the neural
value function on-line (which leads to the problems described in the previous
section), the update is performed off-line considering an entire set of transition
experiences. Experiences are collected in triples of the form (s, a, s
) by interacting
with the (real or simulated) system1. Here, s is the original state, a is the
chosen action and s is the resulting state. The set of experiences is called the
sample set D.

Algorithm

Benchmarking

avoidance control task - keep the system somewhere within the ’valid’ region
of state space. Pole balancing is typically defined as such a problem, where
the task is to avoid that the pole crashes or the cart hits the boundary of
the track.
reaching a goal - the system has to reach a certain area in state space. As soon
as it gets there, the task is immediately finished. Mountaincar is typically
defined as getting the cart to a certain position up the hill.
regulator problem - the system has to reach a certain region in state space
and has to be actively kept there by the controller. This corresponds to the
problems typically tackled with methods of classical control theory.

Empirical Results

Pole Balancing Task

- The Mountain Car Benchmark

Conclusion

The author has concluded that utilising the experience reply, we can avoid the issues discussed above, and achieve the efficient training.

References

BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating
the value function. In Advances in Neural Information Processing
Systems 7. Morgan Kaufmann, 1995.
[EPG05] D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement
learning. Journal of Machine Learning Research, 6:503–556, 2005.
[Gor95] G. J. Gordon. Stable function approximation in dynamic programming. In
A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco,
CA, 1995.
[Lin92] L.-J. Lin. Self-improving reactive agents based on reinforcement learning,
planning and teaching. Machine Learning, 8:293–321, 1992.
[LP03] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of
Machine Learning Research, 4:1107–1149, 2003.
[RB93] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation
learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings
of the IEEE International Conference on Neural Networks (ICNN), pages
586 – 591, San Francisco, 1993.
[Rie00] M. Riedmiller. Concepts and facilities of a neural reinforcement learning control
architecture for technical process control. Journal of Neural Computing
and Application, 8:323–338, 2000.
[SB98] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge,
MA, 1998.
[Tes92] G. Tesauro. Practical issues in temporal difference learning. Machine Learning,
(8):257–277, 1992.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up