TensorflowでGPUを制限・無効化する

  • 44
    Like
  • 0
    Comment

概要

GPU版Tensorflowをデフォルトのまま実行すると全GPUの全メモリを確保してしまいます.

test_gpu.py
import tensorflow as tf
import six
# tf.Sessionを作ってキーボード入力待ちにするだけのコード
tf.Session()
six.moves.input()
$ python test_gpu.py
nvidia-smi
Fri Oct 28 15:21:14 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   55C    P8    20W / 250W |  11603MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:03:00.0     Off |                  N/A |
| 22%   57C    P8    20W / 250W |  11603MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:83:00.0     Off |                  N/A |
| 22%   53C    P8    21W / 250W |  11603MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/A |
| 22%   48C    P8    18W / 250W |  11601MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     22830    C   python                                       11599MiB |
|    1     22830    C   python                                       11599MiB |
|    2     22830    C   python                                       11599MiB |
|    3     22830    C   python                                       11597MiB |
+-----------------------------------------------------------------------------+

共有マシンやGPU1台で十分な場合このままだと不便なためここでは使用するGPUを制限する方法, メモリを全確保しない方法について調べた範囲で分かったことを書きます.

動作確認環境

  • tensorflow(GPU)==0.11rc1

デバイスを制限する

CUDAによる制限

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-var
CUDAには環境変数CUDA_​VISIBLE_​DEVICESを用いてプログラムが利用できるGPUを制限する機能があります.

# GPU: 0だけ使う
$ CUDA_​VISIBLE_​DEVICES=0 python test_gpu.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
nvidia-smi
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     26601    C   python                                       11599MiB |
+-----------------------------------------------------------------------------+

環境変数を使うと複数のGPUやGPUの番号の入れ替えなどもできます.

# GPU: 0と3を使う
$ CUDA_​VISIBLE_​DEVICES=0,3 python test_gpu.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
# 順番を入れ替える, pci bus idの順番が変わっていることがわかる
$ CUDA_​VISIBLE_​DEVICES=3,0 python test_gpu.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
# GPUの無効化, 値を空または不正なGPUID(-1, 5とか)にする
$ CUDA_VISIBLE_DEVICES= python test_gpu.py
...
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
nvidia-smi
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

tf.ConfigProto.gpu_options.visible_device_listを使う方法

tf.Seesion初期化時に渡すtf.ConfigProtoにGPUの設定を追加することで制限することができます.
https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/core/protobuf/config.proto

0.11から追加された tf.GPUOptions.visible_device_list に CUDA_VISIBLE_DEVICESと同様に使用するGPU IDをカンマ区切りで指定することで利用できるGPUを制限できます.

test_gpu_gpuopt_0.py
import tensorflow as tf
import six

config = tf.ConfigProto(
    gpu_options=tf.GPUOptions(
        visible_device_list="0"
    )
)
tf.Session(config=config)
six.moves.input()
$ python test_gpu_gpuopt_0.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
test_gpu_gpuopt_03.py
import tensorflow as tf
import six

config = tf.ConfigProto(
    gpu_options=tf.GPUOptions(
        visible_device_list="0,3"
    )
)
tf.Session(config=config)
six.moves.input()
$ python test_gpu_gpuopt_03.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 3:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)                                                                                                          

ただしこの方法ではGPUの無効化はできないようです.
- visible_device_list="-1"の場合 エラーのため起動できません
- visible_device_list=""の場合 全GPUを使います
- visible_device_list=Noneの場合 全GPUを使います

tf.ConfigProto.device_countを使う方法

tf.ConfigProtoのdevice_countを指定して制限することができます.
ただしこの方法ではTensorflowによるGPU初期化が行われ100M程度GPUのメモリが使われます.

わかりやすいようにどこにメモリ確保されたかを表示させて実行してみます.

test_gpu_device_count_0.py
import tensorflow as tf
config = tf.ConfigProto(
    device_count={"GPU":0}, # GPUの数0に
    log_device_placement=True
)

sess = tf.Session(config=config)

# GPUを使うように
with tf.device("/gpu:0"):
    x = tf.constant(0)

sess.run(x)
$ python test_gpu_device_count_0.py
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:252] Device mapping:
Traceback (most recent call last):
...
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Const': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
     [[Node: Const = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 0>, _device="/device:GPU:0"]()]]
nvidia-smi
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     15335    C   python                                         106MiB |
|    1     15335    C   python                                         106MiB |
|    2     15335    C   python                                         106MiB |
|    3     15335    C   python                                         106MiB |
+-----------------------------------------------------------------------------+

TensorflowはGPU4台を認識して初期化を行っていますがGPU:0は利用できないよ言われてしまいました.
ちなみに allow_soft_placement=True にした場合

I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:252] Device mapping:

Const: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] Const: /job:localhost/replica:0/task:0/cpu:0

CPU上にメモリ確保されます.

test_gpu_device_count_0.pyをdevice_count={"GPU":1}にして実行してみます. (ファイル名:test_gpu_device_count_1.py)

$ test_gpu_device_count_1.py
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0
I tensorflow/core/common_runtime/direct_session.cc:252] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0

Const: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] Const: /job:localhost/replica:0/task:0/gpu:0

今度はちゃんとGPU上に確保されました.
この方法ではデバイスを確保するがグラフ構築時に制限する感じでしょうか?

メモリを制限する

https://www.tensorflow.org/versions/r0.11/how_tos/using_gpu/index.html

メモリの最大値を制限する

これは tf.ConfigProto.gpu_options.per_process_gpu_memory_fraction を設定することで変更できます.

config = tf.ConfigProto(
    gpu_options=tf.GPUOptions(
        per_process_gpu_memory_fraction=0.5 # 最大値の50%まで
    )
)
sess = sess = tf.Session(config=config)

最初にメモリを確保しない.必要になってからメモリを確保する

これは tf.ConfigProto.gpu_options.allow_growth を設定することで変更できます.

config = tf.ConfigProto(
    gpu_options=tf.GPUOptions(
        allow_growth=True # True->必要になったら確保, False->全部
    )
)
sess = sess = tf.Session(config=config)