Tensorforce 0.54 アップデート情報

More than 1 year has passed since last update.

Update notes Tensorforce 0.54


Github: https://github.com/tensorforce/tensorforce
リリースノート: https://github.com/tensorforce/tensorforce/releases/tag/0.5.4
Docs: https://tensorforce.readthedocs.io/en/latest/index.html


  • DQN/DuelingDQN/DPG argument memory now required to be specified explicitly, plus update_frequency default changed
  • Removed (temporarily) conv1d/conv2d_transpose layers due to TensorFlow gradient problems
  • Agent, Environment and Runner can now be imported via from tensorforce import ...
  • New generic reshape layer available as reshape
  • Support for batched version of Agent.act and Agent.observe
  • Support for parallelized remote environments based on Python's multiprocessing and socket (replacing tensorforce/contrib/socket_remote_env/ and tensorforce/environments/environment_process_wrapper.py), available via Environment.create(...), Runner(...) and run.py
  • Removed ParallelRunner and merged functionality with Runner
  • Changed run.py arguments
  • Changed independent mode for Agent.act: additional argument internals and corresponding return value, initial internals via Agent.initial_internals(), Agent.reset() not required anymore
  • Removed deterministic argument for Agent.act unless independent mode
  • Added format argument to save/load/restore with supported formats tensorflow, numpy and hdf5
  • Changed save argument append_timestep to append with default None (instead of 'timesteps')
  • Added get_variable and assign_variable agent functions



これまではReplay memoryのcapacityを指定しない場合、agent.createにてbatch_size + max_episode_timestepsと設定されていましたが、今回のアップデートではmemoryを引数とすることが必須となりました。尚、memoryは引き続き、batch_size + max_episode_timesteps以上となりますので、ご注意を。
また、update_frequency default changedとありますが、これはよくわかりませんでした。変わってないのでは?




これまで1種類だった保存方法(現バージョンではTensorFlow saver)がTensorFlow saver, numpy, HDF5の3形式を選べるようにった。注意点として、numpy, HDF5はANNの重みのみしか保存されない。


RL features

  • State modeling: e.g. sequence inputs
  • Action modeling: e.g. constrained or state-dependent actions
  • Auxiliary losses
  • Prioritized replay
  • Generative memory


  • Direct Future Prediction
  • Categorical DQN, Rainbow
  • Generative Adversarial Imitation Learning
