LoginSignup
1
1

More than 3 years have passed since last update.

NVIDIAドライバ440+CUDA10.2でtensorflowをGPUで動かしたらハマった

Posted at

tensorflowを動かすのに手こずった話。
結論としては、CUDA10.2だけでは動かず、10.1を入れる必要がある。(※tensorflow2.2.0の場合)

事前準備

NVIDIAのページから、ドライバ、CUDAの最新バージョン(10.2)を入れておく

$ nvidia-smi  
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0  On |                  N/A |

+-------------------------------+----------------------+----------------------+

tensorflow gpuを使うための事前準備

https://www.tensorflow.org/install/gpu?hl=ja
を参考に以下をCUDA10.2に合わせたパッケージをインストール。バージョンは2020年5月10日時点で最新の物を入れる

$ sudo apt-get install --no-install-recommends \
        libcudnn7=7.6.5.32-1+cuda10.2  \
        libcudnn7-dev=7.6.5.32-1+cuda10.2

※すでにNVIDIAドライバの440とCUDA10.2はインストール済みだった為、飛ばした。

tensorflowのインストール

$ python -m pip install -U pip # pipを最新にしておく
$ pip install tensorflow
$ pip install tf-nightly # 以上推奨の2つ
$ pip install tensorflow-gpu # 以下とりあえず入れておく
$ pip install tensorflow-addons 
# バージョンの確認
$ pip list |grep tensor
tensorboard             2.2.1
tensorboard-plugin-wit  1.6.0.post3
tensorflow              2.2.0
tensorflow-addons       0.9.1
tensorflow-estimator    2.2.0
tensorflow-gpu          2.2.0

無事に入った様子

tensorflowでGPUが動作するかの確認

$ python
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-05-12 22:03:50.049513: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-12 22:03:50.095310: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3000000000 Hz
2020-05-12 22:03:50.097049: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc6ec000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-12 22:03:50.097116: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-12 22:03:50.109698: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-12 22:03:50.217541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 22:03:50.217838: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4703970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-12 22:03:50.217852: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
2020-05-12 22:03:50.218622: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 22:03:50.218835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.77GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-12 22:03:50.218998: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2020-05-12 22:03:50.244848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-12 22:03:50.263385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-12 22:03:50.267797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-12 22:03:50.304564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-12 22:03:50.311052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-12 22:03:50.378673: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-12 22:03:50.378768: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-05-12 22:03:50.378827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-12 22:03:50.378855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-05-12 22:03:50.378904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
False

失敗…Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object fileとあるので、どうやらcudaの10.1を入れる必要があるっぽい

cuda10.1を入れて再チャレンジ

$sudo apt-get install --no-install-recommends cuda-10-1
インストール処理内容
$ python
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
中略
TRUE

特にlibcudnn7のパッケージをダウングレードする必要なく動いた。

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1