0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

スパコンITOフロントエンドのSingularity 3.7.2でTensorFlow

Last updated at Posted at 2021-03-09

Singularity はコンテナプラットフォームで,Dockerのイメージを利用することができます.スパコンITOではSingularity 3.7.2が簡単に利用できるようになっています.
解説を参照し,ITO基本フロントエンドのベアメタルでSingularityを導入し,TensorFlowを使ってみた備忘録です.GPUはフロントエンドのベアメタル,またはITO-Bにおけるバッチジョブのみで有効です.
注意事項として,AnacondaやMinicondaでPython環境を用意している場合は,これを無効にしておく必要があります.base環境でもだめでした.~/.bashrc にcondaが書き込んだ設定をコメントアウトして再ログインすればよいでしょう.

Singularity の導入

以下を実行します.

$ module load singularity/3.7.2

ここで,ベンダーのCUDAをloadしてはいけません.コンテナに同梱されているCUDAを利用する必要があるためです.このことに気づかずはまりましたが,九州大学センターの問い合わせ窓口で教えて頂きました.
また,ベアメタルの環境では,ulimit コマンドで仮想メモリを増やす必要はありません.デフォルトがulimit -v unlimited のためです.

TensorFlow 1.12.0 のDockerイメージを取得

解説に従い,Dockerのイメージを取得します.

$ singularity pull docker://tensorflow/tensorflow:1.12.0-gpu-py3

GPUのテストを行います.以下のコード gpu_test.py を用意します.platform はPythonのバージョンを調べるためのものです.

# gpu_test.py
import platform
import tensorflow as tf
print(platform.python_version())
print('Hello World!')
tf.test.gpu_device_name()

このコードを実行すると以下のようになり,成功したようです.

$ singularity exec --nv tensorflow_1.12.0-gpu-py3.sif python gpu_test.py
3.5.2
Hello World!
2021-03-09 13:59:12.240360: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-03-09 13:59:12.505886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:37:00.0
totalMemory: 7.93GiB freeMemory: 7.83GiB
2021-03-09 13:59:12.505990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2021-03-09 13:59:15.212543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-09 13:59:15.212599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2021-03-09 13:59:15.212631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2021-03-09 13:59:15.212755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7558 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:37:00.0, compute capability: 6.1)

既述のように,module load cuda/9.0 でベンダーが用意したcuda/9.0をloadしてしまうと,ベンダーのCUDA優先的に利用され,実行時にGPUが認識されないエラーとなりました.

TensorFlow 2.2.2 の dockerイメージを取得

Dockerhubの TensorFlow のタグを参照して,他のバージョンのTensorFlowが取得できます.tensorflow/tensorflow:2.2.2-gpu を取得します.なお,2021年3月9日現在の最新版 latest-gpu は実行時エラーとなりました.CUDAの問題のようでした.

$ singularity pull docker://tensorflow/tensorflow:2.2.2-gpu

先のgpu_test.py を実行すると以下の通り成功したようです.

$ singularity exec --nv tensorflow_2.2.2-gpu.sif python gpu_test.py   3.6.9
Hello World!
2021-03-09 23:47:06.890323: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-03-09 23:47:06.923866: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2300000000 Hz
2021-03-09 23:47:06.932807: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fdc18000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-09 23:47:06.932858: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-03-09 23:47:06.940054: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-09 23:47:07.119720: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd868000b20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-09 23:47:07.119759: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro P4000, Compute Capability 6.1
2021-03-09 23:47:07.122192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:37:00.0 name: Quadro P4000 computeCapability: 6.1
coreClock: 1.48GHz coreCount: 14 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 226.62GiB/s
2021-03-09 23:47:07.126753: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-09 23:47:07.790855: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-09 23:47:07.814671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-09 23:47:07.826531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-09 23:47:07.878114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-09 23:47:07.889282: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-09 23:47:08.842475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-09 23:47:08.843382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-03-09 23:47:08.845420: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-09 23:47:08.846250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-09 23:47:08.846277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2021-03-09 23:47:08.846295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2021-03-09 23:47:08.849832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 7538 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:37:00.0, compute capability: 6.1)

参考資料

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?