More than 5 years have passed since last update.

Tensorflow.Kerasモデルで TPU/GPU/CPU を自動的に切り替える

Posted at 2020-02-04

はじめに

この記事は、TensorFlow.kerasを使用していたとき、ハードウェア情報（主にColaboratoryのランタイム情報）を読み取って、TPUとGPU（一応CPUも）を自動的に切り替えて実行できるプログラムを書く方法をまとめています。 ~~（手動でコメントアウトするのが面倒になってきたため）~~

大体、公式サイト（Google Cloud, Cloud TPU Docs）掲載のKerasとTPUでMNISTを要約した内容です。tensorflow.kerasに慣れている方は、参照元を読んだほうがわかりやすいかもしれません。

補足・注意など

動作検証はColaboratory上で実施しています。
たぶんtensorflow ver2にも対応しています。
tensorflow.keras 用のコードです。（puer keras や tensoflow ではない）
記述（特に解説）に誤りがある可能性があります。

TPU/GPU/CPUを自動的に切り替えるコード

# ハードウェア情報取得
import tensorflow as tf

try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
except ValueError:
  tpu = None
  gpus = tf.config.experimental.list_logical_devices("GPU")

if tpu:
  tf.tpu.experimental.initialize_tpu_system(tpu)
  strategy = tf.distribute.experimental.TPUStrategy(tpu, steps_per_run=128) # Going back and forth between TPU and host is expensive. Better to run 128 batches on the TPU before reporting back.
  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
elif len(gpus) > 1:
  strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])
  print('Running on multiple GPUs ', [gpu.name for gpu in gpus])
elif len(gpus) == 1:
  strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU
  print('Running on single GPU ', gpus[0].name)
else:
  strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU
  print('Running on CPU')
print("Number of accelerators: ", strategy.num_replicas_in_sync)


# モデル作成やモデルロード、モデルコンパイル時に、with strategy.scope()で囲む
from tensorflow import keras
with strategy.scope():
    model = make_model()

特に難しいことは考えずに以下の２点を実行すれば、ハードウェアに応じて、TPU/GPU/CPUを勝手に切り替えてくれるようになります。

import tensorflow as tf 〜print("Number of accelerators (略までをコピペで貼り付ける
モデル定義やモデル読み込み部分をwith strategy.scope():のスコープ内に入れる

解説っぽいなにか

優先度
- プログラムのとおり「TPU > multi GPU > single GPU > CPU」の順で選択します。
tf.distribute.cluster_resolver.TPUClusterResolver()
- TPUのハードウェア情報を獲得します。TPUが利用できない環境ではエラーが発生します。
tf.config.experimental.list_logical_devices("GPU")
- GPUのハードウェア情報を獲得します。返り値はリストです。なお、"CPU"を与えれば、CPUの情報も取得できます。
if tpu:以降
- それぞれのデバイスの定義方法です。TPUとmultiGPUが特殊で、「こう書くものだ」と覚えてしまったほうが良いでしょう。
tf.distribute.get_strategy()
- ハードウェア情報を参照しているのではなく、インストールされているtensorflowがGPUに対応しているかを見ています。
- つまり、実行時でなく、tensorflowインストール時に決まる値です（のはずです）。
- GPUを搭載しているのに、上手く行かない場合はCUDAのインストール周りを見直しましょう（参考：GPU support）

おわりに

この記事では、tensorflow.keras で、ハードウェアの状況に応じて、TPU/GPU/CPUを自動的に切り替える方法を解説しました。
tensorflow.kerasでTPUを活用して、より良いディープラーニングライフを。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up