18
19

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

tensorflow-gpuをRTX3070で動かすのは大変だった

Last updated at Posted at 2021-02-13

概略

TensorFlow公式ページ(https://www.tensorflow.org/install/gpu) に従ってGPU版を導入するもうまく動作しなかった。

ファイル名の書き換えで動作することを確認。

RTX30シリーズで頻発している様子。
今回の対応はかなり応急処置的なので改善待ちですね・・・。

環境

OS
Windows 10 Home(64bit)
バージョン : 20H2

CPU
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz 2.90 GHz

GPU
GeForce RTX 3070
Driver version : 461.40
CUDA Version : 11.2

言語とライブラリ
conda : 4.9.2
Python : 3.8.5
tensorflow-gpu : 2.4.1

解決までの経緯

指定の要件で動かない

公式のソフトウェア要件(https://www.tensorflow.org/install/gpu#software_requirements) の通りにセットアップして学習を始めると以下のメッセージが

2021-02-13 17:10:23.436049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-02-13 17:10:24.817916: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-13 17:10:24.818847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-02-13 17:10:24.850803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s

~~ 中略 ~~

2021-02-13 17:10:24.892311: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-02-13 17:10:24.895383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-02-13 17:10:24.895962: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-02-13 17:10:24.908415: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-02-13 17:10:24.909790: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-13 17:10:24.910559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-13 17:10:24.910934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]    

2021-02-13 17:10:24.911055: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-13 17:10:25.019189: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/100
15/15 [==============================] - 2s 80ms/step - loss: 1.1186 - accuracy: 0.3416
Epoch 2/100
15/15 [==============================] - 1s 80ms/step - loss: 1.0781 - accuracy: 0.3804

~~ 以下略 ~~

最終的に学習は始まってるし一見できている風なんですが、

ここ

Skipping registering GPU devices...

「GPUでやるの飛ばすね。。。」とおっしゃっています。
なので、CPUで動いています。なぜ?

'cusolver64_10.dll'が読み込めない

さっきのメッセージをよく読むと

Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

「'cusolver64_10.dll'がないよ」とのこと。
実際CUDAの入っているフォルダを確認してもそんなものはない。

調べてみると公式のGitHubリポジトリでもissueが上がっていました。
(https://github.com/tensorflow/tensorflow/issues/44291)
そこで見つけた解決策がこちら

訳「'cusolver64_11.dll'を'cusolver64_10.dll'に名前変更したら解決しましたよ」

それありなの?(笑)

ありでした

2021-02-13 18:02:35.671492: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll

はい、読み込めてますね。
(かなり胡散臭い対応ですが)これでとりあえずは動くみたいです。

ちなみにGPUだとさっきのCPUで動かした場合の約10倍の速度で学習しています。早い!

# GPU
Epoch 1/100
15/15 [==============================] - 0s 10ms/step - loss: 1.1017 - accuracy: 0.3842
Epoch 2/100
15/15 [==============================] - 0s 8ms/step - loss: 1.0474 - accuracy: 0.4595  

# CPU
Epoch 1/100
15/15 [==============================] - 2s 80ms/step - loss: 1.1186 - accuracy: 0.3416
Epoch 2/100
15/15 [==============================] - 1s 80ms/step - loss: 1.0781 - accuracy: 0.3804

参考

GPU サポート  |  TensorFlow
https://www.tensorflow.org/install/gpu

cuda - tensorflow-gpu の導入がうまくいきません [RTX 3070] - スタック・オーバーフロー
https://ja.stackoverflow.com/questions/72013/tensorflow-gpu-%E3%81%AE%E5%B0%8E%E5%85%A5%E3%81%8C%E3%81%86%E3%81%BE%E3%81%8F%E3%81%84%E3%81%8D%E3%81%BE%E3%81%9B%E3%82%93-rtx-3070

nvidia cudaセットアップ - Qiita
https://qiita.com/mailstop/items/a84ac6b4eac8d12ba488

tensorflow-nightly-gpu looking for cusolver64_10.dll on a cuDNN 11.1 installation · Issue #44291 · tensorflow/tensorflow
https://github.com/tensorflow/tensorflow/issues/44291

RTX 3090 and Tensorflow for Windows 10 - step by step : tensorflow
https://www.reddit.com/r/tensorflow/comments/jsalkw/rtx_3090_and_tensorflow_for_windows_10_step_by/

18
19
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
18
19

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?