概要
GPU版Tensorflowをデフォルトのまま実行すると全GPUの全メモリを確保してしまいます.
import tensorflow as tf
import six
# tf.Sessionを作ってキーボード入力待ちにするだけのコード
tf.Session()
six.moves.input()
$ python test_gpu.py
Fri Oct 28 15:21:14 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44 Driver Version: 367.44 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:02:00.0 Off | N/A |
| 22% 55C P8 20W / 250W | 11603MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:03:00.0 Off | N/A |
| 22% 57C P8 20W / 250W | 11603MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 0000:83:00.0 Off | N/A |
| 22% 53C P8 21W / 250W | 11603MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 0000:84:00.0 Off | N/A |
| 22% 48C P8 18W / 250W | 11601MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22830 C python 11599MiB |
| 1 22830 C python 11599MiB |
| 2 22830 C python 11599MiB |
| 3 22830 C python 11597MiB |
+-----------------------------------------------------------------------------+
共有マシンやGPU1台で十分な場合このままだと不便なためここでは使用するGPUを制限する方法, メモリを全確保しない方法について調べた範囲で分かったことを書きます.
動作確認環境
- tensorflow(GPU)==0.11rc1
デバイスを制限する
CUDAによる制限
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-var
CUDAには環境変数CUDA_VISIBLE_DEVICES
を用いてプログラムが利用できるGPUを制限する機能があります.
# GPU: 0だけ使う
$ CUDA_VISIBLE_DEVICES=0 python test_gpu.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 26601 C python 11599MiB |
+-----------------------------------------------------------------------------+
環境変数を使うと複数のGPUやGPUの番号の入れ替えなどもできます.
# GPU: 0と3を使う
$ CUDA_VISIBLE_DEVICES=0,3 python test_gpu.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
# 順番を入れ替える, pci bus idの順番が変わっていることがわかる
$ CUDA_VISIBLE_DEVICES=3,0 python test_gpu.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
# GPUの無効化, 値を空または不正なGPUID(-1, 5とか)にする
$ CUDA_VISIBLE_DEVICES= python test_gpu.py
...
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
tf.ConfigProto.gpu_options.visible_device_listを使う方法
tf.Seesion初期化時に渡すtf.ConfigProtoにGPUの設定を追加することで制限することができます.
https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/core/protobuf/config.proto
0.11から追加された tf.GPUOptions.visible_device_list に CUDA_VISIBLE_DEVICES
と同様に使用するGPU IDをカンマ区切りで指定することで利用できるGPUを制限できます.
import tensorflow as tf
import six
config = tf.ConfigProto(
gpu_options=tf.GPUOptions(
visible_device_list="0"
)
)
tf.Session(config=config)
six.moves.input()
$ python test_gpu_gpuopt_0.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
import tensorflow as tf
import six
config = tf.ConfigProto(
gpu_options=tf.GPUOptions(
visible_device_list="0,3"
)
)
tf.Session(config=config)
six.moves.input()
$ python test_gpu_gpuopt_03.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 3: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
ただしこの方法ではGPUの無効化はできないようです.
-
visible_device_list="-1"
の場合 エラーのため起動できません -
visible_device_list=""
の場合 全GPUを使います -
visible_device_list=None
の場合 全GPUを使います
tf.ConfigProto.device_countを使う方法
tf.ConfigProtoのdevice_countを指定して制限することができます.
ただしこの方法ではTensorflowによるGPU初期化が行われ100M程度GPUのメモリが使われます.
わかりやすいようにどこにメモリ確保されたかを表示させて実行してみます.
import tensorflow as tf
config = tf.ConfigProto(
device_count={"GPU":0}, # GPUの数0に
log_device_placement=True
)
sess = tf.Session(config=config)
# GPUを使うように
with tf.device("/gpu:0"):
x = tf.constant(0)
sess.run(x)
$ python test_gpu_device_count_0.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 3: N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:252] Device mapping:
Traceback (most recent call last):
...
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Const': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: Const = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 0>, _device="/device:GPU:0"]()]]
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 15335 C python 106MiB |
| 1 15335 C python 106MiB |
| 2 15335 C python 106MiB |
| 3 15335 C python 106MiB |
+-----------------------------------------------------------------------------+
TensorflowはGPU4台を認識して初期化を行っていますがGPU:0は利用できないよ言われてしまいました.
ちなみに allow_soft_placement=True
にした場合
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:252] Device mapping:
Const: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] Const: /job:localhost/replica:0/task:0/cpu:0
CPU上にメモリ確保されます.
test_gpu_device_count_0.pyをdevice_count={"GPU":1}にして実行してみます. (ファイル名:test_gpu_device_count_1.py)
$ test_gpu_device_count_1.py
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0
I tensorflow/core/common_runtime/direct_session.cc:252] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0
Const: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] Const: /job:localhost/replica:0/task:0/gpu:0
今度はちゃんとGPU上に確保されました.
この方法ではデバイスを確保するがグラフ構築時に制限する感じでしょうか?
メモリを制限する
メモリの最大値を制限する
これは tf.ConfigProto.gpu_options.per_process_gpu_memory_fraction
を設定することで変更できます.
config = tf.ConfigProto(
gpu_options=tf.GPUOptions(
per_process_gpu_memory_fraction=0.5 # 最大値の50%まで
)
)
sess = sess = tf.Session(config=config)
最初にメモリを確保しない.必要になってからメモリを確保する
これは tf.ConfigProto.gpu_options.allow_growth
を設定することで変更できます.
config = tf.ConfigProto(
gpu_options=tf.GPUOptions(
allow_growth=True # True->必要になったら確保, False->全部
)
)
sess = sess = tf.Session(config=config)