tensorflowを動かすのに手こずった話。
結論としては、CUDA10.2だけでは動かず、10.1を入れる必要がある。(※tensorflow2.2.0の場合)
事前準備
NVIDIAのページから、ドライバ、CUDAの最新バージョン(10.2)を入れておく
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... Off | 00000000:01:00.0 On | N/A |
+-------------------------------+----------------------+----------------------+
tensorflow gpuを使うための事前準備
https://www.tensorflow.org/install/gpu?hl=ja
を参考に以下をCUDA10.2に合わせたパッケージをインストール。バージョンは2020年5月10日時点で最新の物を入れる
$ sudo apt-get install --no-install-recommends \
libcudnn7=7.6.5.32-1+cuda10.2 \
libcudnn7-dev=7.6.5.32-1+cuda10.2
※すでにNVIDIAドライバの440とCUDA10.2はインストール済みだった為、飛ばした。
tensorflowのインストール
$ python -m pip install -U pip # pipを最新にしておく
$ pip install tensorflow
$ pip install tf-nightly # 以上推奨の2つ
$ pip install tensorflow-gpu # 以下とりあえず入れておく
$ pip install tensorflow-addons
# バージョンの確認
$ pip list |grep tensor
tensorboard 2.2.1
tensorboard-plugin-wit 1.6.0.post3
tensorflow 2.2.0
tensorflow-addons 0.9.1
tensorflow-estimator 2.2.0
tensorflow-gpu 2.2.0
無事に入った様子
tensorflowでGPUが動作するかの確認
$ python
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-05-12 22:03:50.049513: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-12 22:03:50.095310: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3000000000 Hz
2020-05-12 22:03:50.097049: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc6ec000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-12 22:03:50.097116: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-05-12 22:03:50.109698: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-12 22:03:50.217541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 22:03:50.217838: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4703970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-12 22:03:50.217852: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
2020-05-12 22:03:50.218622: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 22:03:50.218835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.77GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-12 22:03:50.218998: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2020-05-12 22:03:50.244848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-12 22:03:50.263385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-12 22:03:50.267797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-12 22:03:50.304564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-12 22:03:50.311052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-12 22:03:50.378673: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-12 22:03:50.378768: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-05-12 22:03:50.378827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-12 22:03:50.378855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-05-12 22:03:50.378904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
False
失敗…Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file
とあるので、どうやらcudaの10.1を入れる必要があるっぽい
cuda10.1を入れて再チャレンジ
$sudo apt-get install --no-install-recommends cuda-10-1
インストール処理内容
$ python
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
中略
TRUE
特にlibcudnn7
のパッケージをダウングレードする必要なく動いた。