LoginSignup
0
0

More than 3 years have passed since last update.

Tips for debugging in (D)RL

Last updated at Posted at 2019-11-03

Introduction

this is not meant to be in public tho, so let me ignore any grammatical mistakes made in this article. please kindly ignore them if exist...

Hyper-param Optimisation

I'm also confused by this, but I use decaying learning rates, then I watch the loss curves to see when they begin to converge. In this example the loss_critic is only decreasing when lr_critic (learning rate) is 2e-3. So I probably need to increase it.

Debugging RL algorithms in general

  • Start with the simplest environment available
    • As this guy's summarised on Github, in the continuous action space problem it is better to start with Pendulum, while in the discrete action space case, Cartpole.
    • Alway, make sure to play with a random agent at the begging of your journey to implement an algorithm to familiarize yourself with certain environments.
    • Once your algorithm could solve those tasks, then you may move on to more difficult tasks.
  • Feature engineering
    • As people discussed, we may need to scale the observations(e.g., raw pixels)/states(e.g., sensor signals) to fit in [0, 1]. But as far as I've read some codebases, e.g., dopamine / baselines / tf-agents, they don't scale the observations in DQN. Rather they seem to prefer to scale the states than the observations. Of course, reward scaling does matter so don't miss it like this guy

I don't think I would agree. I think there's always a trick or a bug. In my particular case I'm working on at the moment what turned out to be the game-changer (and as of tonight made my RL agents actually learn something :)) was rescaling the reward from [-1, 1] to [0, 1] as suggested in this seemingly unrelated post and, admittedly, several of the pointers mentioned above. Thanks again to everyone that contributed!

  • Think a lot, experiment less.
    • I, once, experienced that throughout trial-error of debugging and repeating the similar experiments without devoting much time on the thought process. I was literally start an experiment and wait for some 10-20 mins doing other task and check Tensorboard and stop the experiment to change other part. After read this great article, I've started following his work-log style and deeply analyse the result of experiment as well as train it for longer. Turns out this makes my thought quite clear and reduces the time to find the bug/hyper-params behavioural influence on the agent.
  • Train for longer.
    • As mentioned just above, while I was in the thought process, I accidentally found that the experiment which I'd thought wouldn't go well, turned out better result than before. this is just because I waited a bit longer than the previous experiment. So as John Schulman mentioned, train it for longer than stated in the original paper.

Why PyTorch implementation is working but Tensorflow one isn't??

References

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0