0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

[Review] Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Last updated at Posted at 2018-06-11

Detail of the paper

Title: Neural Fitted Q Iteration - First Experiences
with a Data Efficient Neural Reinforcement
Learning Method
Published Year: 2005
author: Martin Riedmiller
Link: http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf

Introduction

This paper introduced the novel approach which combined the classical Q-learning and multi-layer perceptron. And this is one of the origin of the further development in the domain of Deep Reinforcement Learning.
Basically this is a function approximation using multi-layer perceptron to efficiently represent Q function.
And the author firstly clarified the advantages and disadvantages arising in this approach reported by other existing research conducted by [Tes92, Lin92, Rie00].

  • Advantages: Generalisation: since MLP can affect effectively globally, the model can represent the Q function efficiently compared to other local-oriented approach
  • Disadvantage: Danger of the divergence in Learning: to widely cover the observations to train the model globally, this is not be assured the convergence in learning.

Basic Concept

The basic idea underlying NFQ is the following: Instead of updating the neural
value function on-line (which leads to the problems described in the previous
section), the update is performed off-line considering an entire set of transition
experiences. Experiences are collected in triples of the form (s, a, s
) by interacting
with the (real or simulated) system1. Here, s is the original state, a is the
chosen action and s is the resulting state. The set of experiences is called the
sample set D.

Algorithm

Screen Shot 2018-06-11 at 18.51.37.png

Benchmarking

  • avoidance control task - keep the system somewhere within the ’valid’ region
    of state space. Pole balancing is typically defined as such a problem, where
    the task is to avoid that the pole crashes or the cart hits the boundary of
    the track.
  • reaching a goal - the system has to reach a certain area in state space. As soon
    as it gets there, the task is immediately finished. Mountaincar is typically
    defined as getting the cart to a certain position up the hill.
  • regulator problem - the system has to reach a certain region in state space
    and has to be actively kept there by the controller. This corresponds to the
    problems typically tackled with methods of classical control theory.

Empirical Results

  • Pole Balancing Task
Screen Shot 2018-06-11 at 19.06.44.png - The Mountain Car Benchmark Screen Shot 2018-06-11 at 19.07.13.png

Conclusion

The author has concluded that utilising the experience reply, we can avoid the issues discussed above, and achieve the efficient training.

References

BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating
the value function. In Advances in Neural Information Processing
Systems 7. Morgan Kaufmann, 1995.
[EPG05] D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement
learning. Journal of Machine Learning Research, 6:503–556, 2005.
[Gor95] G. J. Gordon. Stable function approximation in dynamic programming. In
A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco,
CA, 1995.
[Lin92] L.-J. Lin. Self-improving reactive agents based on reinforcement learning,
planning and teaching. Machine Learning, 8:293–321, 1992.
[LP03] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of
Machine Learning Research, 4:1107–1149, 2003.
[RB93] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation
learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings
of the IEEE International Conference on Neural Networks (ICNN), pages
586 – 591, San Francisco, 1993.
[Rie00] M. Riedmiller. Concepts and facilities of a neural reinforcement learning control
architecture for technical process control. Journal of Neural Computing
and Application, 8:323–338, 2000.
[SB98] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge,
MA, 1998.
[Tes92] G. Tesauro. Practical issues in temporal difference learning. Machine Learning,
(8):257–277, 1992.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?