More than 5 years have passed since last update.

【秒速で無料GPUを使う】TensorFow(Keras)/PyTorch/Chainer環境構築 on Colaboratory

Last updated at 2019-11-16Posted at 2018-01-21

2019/5/11 PR: こちらの内容を含め、2019年5月11日発刊の図解速習DEEP LEARNINGという本ができました。[2019年5月版] 機械学習・深層学習を学び、トレンドを追うためのリンク150選 - Qiitaでも、一部内容をご覧いただけます

19/1/11 18年1月の公開後、TensorFlow本体にKeras統合、Chainerがデフォルトで提供となるなど、状況が変化したため、大幅に加筆しました。TensorFlow 2.0 Previewについても追記しました。

19/1/31 PyTorchが標準インストールとなったこと、PyTorch/ TensorFlowのColab版チュートリアルを追記。

2019/3/9 Colaboratoryに関する情報交換Slackを試験的に立ち上げました。リンクより、登録・ご参加ください。

TL;DR

Google Colabで新たに無料でGPU環境が使えるようになった
- K80, 連続12hr利用可能
お金も構築時間もショートカット
- クラウドで自分でGPUインスタンス借りて構築しなくていい
- GPUパソコン組み立てなくてもいい
こんな感じで、ノー準備で下記代表的なフレームワークが動かせる
ちょっとしたhands onをやったりするのに最適
実践時のTipsをまとめました - 【秒速で無料GPUを使う】深層学習実践Tips on Colaboratory - Qiita

はじめに

Google Colabで、【無料で】FreeでK80相当GPUを1回12hrまで使えるっていうすばらしいtweetが流れてきた。

Ummmm, Colab now lets you use GPUs to accelerate your notebooks? In the cloud? For free? 😍🤓😍🤓😍🤓 Step-by-step how to: https://t.co/CENGVweaTy
— Rachael Tatman (@rctatman) January 19, 2018

クラウドでGPUインスタンス借りなくてもいいし、最初からGPUパソコン組み立てなくてもいい。お金も構築時間もショートカットできる。

初学者のハードルがめちゃ下がる。すばらしい。なので軽いTutorialを書く。

前提

割当てリソース

What's the hardware spec for Google Colaboratory? - Stack Overflow

などを参考に、colaboratoryの提供環境は下記の通り(20181016更新):

n1-highmem-2 instance
Ubuntu 18.04
2vCPU @ 2.2GHz
13GB RAM
(GPUなし/ TPU)40GB, (GPUあり)360GB Storage
GPU NVIDIA Tesla K80 12GB
アイドル状態が90分続くと停止
連続使用は最大12時間
Notebookサイズは最大20MB

雑感としては:

それなりのサイズのdatasetをdisk上に持ってこれる
RAMはそんなにないので、Pythonのarrayでメモリ上に持っておく量は加減が必要
- epochで使うdatasetが大量ならbatch毎に読み込むとか
tutorialとかをこなすには十分

という感じ。

!cat /etc/issue

Ubuntu 18.04.1 LTS \n \l

!df -h (GPUありでの例)

Filesystem      Size  Used Avail Use% Mounted on
overlay         359G  9.4G  331G   3% /
...

!free -h

              total        used        free      shared  buff/cache   available
Mem:            12G        391M        6.6G        828K        5.7G         12G
Swap:            0B          0B          0B

!cat /proc/cpuinfo (GPUありでの例)

processor	: 0
...
model name	: Intel(R) Xeon(R) CPU @ 2.20GHz
...
cpu MHz		: 2200.000
cache size	: 56320 KB
...

processor	: 1
...
model name	: Intel(R) Xeon(R) CPU @ 2.20GHz
...
cpu MHz		: 2200.000
cache size	: 56320 KB
...

!cat /proc/driver/nvidia/gpus/0000:00:04.0/information　 (GPUありでの例)

Model: 		 Tesla K80
IRQ:   		 33
...

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 14142945018355836735, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 358416384
 locality {
   bus_id: 1
 }
 incarnation: 7915847976140889213
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"]

!nvcc -v (GPUありでCUDAバージョンの確認)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

!nvidia-smi (GPUありでアサインGPU、Driverの確認)

Thu Jan 31 xx:xx:xx 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   30C    P8    26W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

試し方

Google Colabを開く

新規ノートブックを作成

ファイル > Python3の新しいノートブック を選ぶ。

GPUをアサイン

画面上部のメニュー ランタイム > ランタイムのタイプを変更 で、 ノートブックの設定 を開く
ハードウェアアクセラレータに GPU を選択し、 保存 する

GPUが正しくアサインされたか確認

[+]コード から、コード入力用のセルを追加する
セルに下記を入力

import tensorflow as tf
tf.test.gpu_device_name()

下記が出力されると、正しくGPUがアサインされている(backend GPUが足りないとエラーになることもあるらしい)

'/device:GPU:0'

TensorFlow

環境準備

TensorFlow 1.x系は、準備不要ですぐ使える。
2.0系はPreviewが始まっています。興味がある方は、TensorFlow 2.0 Previewを最速で試す on Colaboratory - Qiitaから試してみてください。

CPU/GPUのパフォーマンスを比較

畳み込みをCPUとGPUで比較する下記コードを実行する。同様にコード入力用セルを追加し、コードを貼り付け、実行する。

https://www.kaggle.com/getting-started/47096#post271139 より引用

import tensorflow as tf
import timeit

# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

with tf.device('/cpu:0'):
  random_image_cpu = tf.random_normal((100, 100, 100, 3))
  net_cpu = tf.layers.conv2d(random_image_cpu, 32, 7)
  net_cpu = tf.reduce_sum(net_cpu)

with tf.device('/gpu:0'):
  random_image_gpu = tf.random_normal((100, 100, 100, 3))
  net_gpu = tf.layers.conv2d(random_image_gpu, 32, 7)
  net_gpu = tf.reduce_sum(net_gpu)

sess = tf.Session(config=config)

# Test execution once to detect errors early.
try:
  sess.run(tf.global_variables_initializer())
except tf.errors.InvalidArgumentError:
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise

def cpu():
  sess.run(net_cpu)

def gpu():
  sess.run(net_gpu)

# Runs the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

sess.close()

下記のような結果が得られる。GPUで、9倍速かったって。

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
8.584801190000007
GPU (s):
0.8974968620000254
GPU speedup over CPU: 9x

チュートリアル

TensorFlow Tutorialsは、各項目の Run in Google Colab ボタンを押すと、すぐに実行して試すことができます。

色々な事例を試す

【即スマホで試せる】55の深層学習実装 on Google Seedbank - 画像分類から翻訳、音楽生成まで - Qiitaから、様々な事例を試してみましょう。

Keras

環境準備

Kerasは、公式にTensorFlowに含まれるモジュールとなったため、別途インストールする必要はなくなりました。

from tensorflow import keras

Fashion MNIST分類を試す

Seedbank: "Fashion MNIST with tf.keras"こちらから、Kerasを用いた画像分類を試すことができます。

PyTorch

環境準備

19年1月から、PyTorchが、準備不要ですぐに使えるようになりました。

~~必要なライブラリをpip installする。~~
~~ただし、OS/Python/CUDAバージョン毎にインストール方法が異なることがある。Python/CUDAバージョンをチェックのうえ、PyTorchから、~~

~~Pytorch build: (選びたいver.)~~
~~Stable OS: Linux~~
~~Package: Pip~~

~~を選んだ上で、該当のPython/CUDAバージョンに合ったインストールコマンドを確認する。~~

バージョン、GPUがenabledか確認

下記を実行する。

import torch
torch.cuda.is_available()

True が返ればGPUがPyTorchから使えている。

print(torch.__version__)

1.0.0 など、プリインストールバージョンが表示される。

サンプルコードの実行(CIFAR10 CNN Classifier)

Training a classifier — PyTorch Tutorials 1.1.0 documentation のコードを、【詳細（？）】pytorch入門　〜CIFAR10をCNNする〜 - Qiita を参考に逐次実行してみる。

正しく学習できることが確認できた。

[1,  2000] loss: 2.220
[1,  4000] loss: 1.863
[1,  6000] loss: 1.697
[1,  8000] loss: 1.588
[1, 10000] loss: 1.509
[1, 12000] loss: 1.466
[2,  2000] loss: 1.375
[2,  4000] loss: 1.357
[2,  6000] loss: 1.339
[2,  8000] loss: 1.306
[2, 10000] loss: 1.306
[2, 12000] loss: 1.288
Finished Training

チュートリアル

PyTorchは、公式ページのチュートリアルが充実しています。
また、その内容をColaboratoryへポーティングしているリポジトリ(param087/Pytorch-tutorial-on-Google-colab: PyTorch Tutorial on google colaboratory.)があり、便利です。

Chainer

環境準備

18年12月より、Chainer, バックエンドのCuPy, iDeepが準備不要ですぐ使えるようになりました。

import chainer
chainer.print_runtime_info()

下記が返されます。

Platform: Linux-4.14.79+-x86_64-with-Ubuntu-18.04-bionic
Chainer: 5.0.0
NumPy: 1.14.6
CuPy:
  CuPy Version          : 5.0.0
  CUDA Root             : /usr/local/cuda
  CUDA Build Version    : 9020
  CUDA Driver Version   : 9020
  CUDA Runtime Version  : 9020
  cuDNN Build Version   : 7201
  cuDNN Version         : 7201
  NCCL Build Version    : 2213
iDeep: 2.0.0.post3