More than 5 years have passed since last update.

Unity ML Agent on Google Colab

Last updated at 2018-06-23Posted at 2018-06-23

Introduction

Recently I have been searching for an efficient way to test the RL agent on my local machine and just found the best environment, which is Google Colaboratory.
Basically, it is a google's academic research-oriented project and surprisingly they are providing free usage of K-80 GPU..... Who can deny this precious offer ......
Anyway, since I would like to try DQN, I have decided to do some research on the connectivity of google colab, because I wanted to connect the agent on my local to the plenty of computation resource of GPU.
So in this article, I will talk about the method how to connect your local to Google colab and also how to try out Unity ml-agent!
Oh, by the way, there is no reason to select ml-agent, i just though playing a 3D game with intelligent agent should be more fun.

Official Doc

Guess what,, they made incredibly good official doc already.. so just please follow that above!
Mind that the default runtime mode is set to host-machine on google server. So double-check the runtime every time you open the google colab!

Unity ml agent

Needless to say, Unity is originally a game engine and one of the best platform when you develop a quality 3/2D game!
But these days, they put much more effort the joint domain of DL and game.
And this ml-agent is one of their master piece!

official git repo

So, let's get started to talk about the procedure of using ml-agent.

get Unity
clone this repo
jump into this README
when it comes to the section of Setting up ML-Agents within Unity. the pic on the article is not my case... so this is how it looks to me.

Also, they said that create new project within the unity-environment folder at the beginning though, as I went through it turned out that the distributed assets folder was not detected properly. so I did manually transfer the TensorFlowSharp plugin. so if this happens to you, please do this!
then proceed more, and after you actually run the python script python3 learn.py --run-id=<run-identifier> --train, you will be seeing the agents learning.
now, it's time to move on to use the precious free golden egg from Google. so please open the google colab connected your local as we prepared before.
then hit this command! !python3 learn.py --run-id=<run-identifier> --train.... lol that's it.

Warning

One thing I noticed is that the stop action of google colab is executed very first and does not wait the saving model function in our code on the line 282 in trainer_controller.py in unitytrainers directory.
So, first of all, please modify the code like

trainer_controller.py

            except KeyboardInterrupt:
                self._save_model(sess, steps=global_step, saver=saver)
                print('--------------------------Now saving model-------------------------')
                if self.train_model:
                    self.logger.info("Learning was interrupted. Please wait while the graph is generated.")

So, the save_model will be executed prior than anything.
And one more thing is that I don't know why but you have to terminate the command not from google colab-side but from Unity side.
Unity-side in a sense, you just push the stop button to stop playing.
then you can see the log like this!


 
    
                    ▄▄▄▓▓▓▓
               ╓▓▓▓▓▓▓█▓▓▓▓▓
          ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
        ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
      ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
    ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
    ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
      ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
        '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
           ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
               `▀█▓▓▓▓▓▓▓▓▓▌
                    ¬`▀▀▀█▓
                    

INFO:unityagents:{'--curriculum': 'None',
 '--docker-target-name': 'Empty',
 '--help': False,
 '--keep-checkpoints': '5',
 '--lesson': '0',
 '--load': False,
 '--no-graphics': False,
 '--run-id': 'trial_02',
 '--save-freq': '50000',
 '--seed': '-1',
 '--slow': False,
 '--train': True,
 '--worker-id': '0',
 '<env>': None}
INFO:unityagents:Start training by pressing the Play button in the Unity Editor.
INFO:unityagents:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: Ball3DBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 
2018-06-24 00:20:11.408557: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
INFO:unityagents:Hyperparameters for the PPO Trainer of brain Ball3DBrain: 
	batch_size:	1200
	beta:	0.001
	buffer_size:	12000
	epsilon:	0.2
	gamma:	0.995
	hidden_units:	128
	lambd:	0.95
	learning_rate:	0.0003
	max_steps:	5.0e4
	normalize:	True
	num_epoch:	3
	num_layers:	2
	time_horizon:	1000
	sequence_length:	64
	summary_freq:	1000
	use_recurrent:	False
	graph_scope:	
	summary_path:	./summaries/trial_02
	memory_size:	256
	use_curiosity:	False
	curiosity_strength:	0.01
	curiosity_enc_size:	128
INFO:unityagents: Ball3DBrain: Step: 1000. Mean Reward: 1.608. Std of Reward: 0.732.
INFO:unityagents: Ball3DBrain: Step: 2000. Mean Reward: 1.663. Std of Reward: 0.837.
INFO:unityagents: Ball3DBrain: Step: 3000. Mean Reward: 1.820. Std of Reward: 1.029.
INFO:unityagents: Ball3DBrain: Step: 4000. Mean Reward: 2.217. Std of Reward: 1.536.
INFO:unityagents: Ball3DBrain: Step: 5000. Mean Reward: 2.665. Std of Reward: 1.846.
INFO:unityagents: Ball3DBrain: Step: 6000. Mean Reward: 3.862. Std of Reward: 3.062.
INFO:unityagents: Ball3DBrain: Step: 7000. Mean Reward: 6.509. Std of Reward: 6.689.
INFO:unityagents: Ball3DBrain: Step: 8000. Mean Reward: 10.443. Std of Reward: 11.650.
INFO:unityagents: Ball3DBrain: Step: 9000. Mean Reward: 23.630. Std of Reward: 21.395.
INFO:unityagents:Saved Model
--------------------------Now saving model-------------------------
INFO:unityagents:Learning was interrupted. Please wait while the graph is generated.
INFO:unityagents:List of nodes to export :
INFO:unityagents:	action
INFO:unityagents:	value_estimate
INFO:unityagents:	action_probs
INFO:tensorflow:Restoring parameters from ./models/trial_02/model-9780.cptk
INFO:tensorflow:Froze 16 variables.
Converted 16 variables to const ops.

my notebook

Thank you!!

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up