More than 5 years have passed since last update.

機械学習の際にGPUを認識しないときの解決策

Posted at 2020-05-03

機械学習の際に重い処理（RNN後そのままConfusionMatrix）をするとChromeが落ちる。そんなに重いか？ってのはさておきGPUがちゃんと機能している確認する。

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

2020-05-03 08:31:40.027079: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
↑↑↑
上記エラーは無視してもいいらいしい。
https://teratail.com/questions/193036

2020-05-03 08:31:40.090341: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-05-03 08:31:40.090757: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55ebdc78e290 executing computations on platform Host. Devices:
2020-05-03 08:31:40.090787: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1732731571361142608
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 2910256701880528088
physical_device_desc: "device: XLA_CPU device"
]

いろいろでてきたが、GPUの文字はない。。
そこでCUDA,cuDNNなどバージョン確認
cudnn.hをテキストでオープン
define CUDNN_MAJOR 7
define CUDNN_MINOR 6
define CUDNN_PATCHLEVEL 5
どうやら7.6->7.4が適切。

下記の手順に従い書き換え（cuDNNのサイトより）
Navigate to your directory containing the cuDNN Tar file.
Unzip the cuDNN package.
$ tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/include/cudnn_version.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

うーん・変わらない。
同じような症状でお悩みの方。発見。（10.1，10.0が混じってる可能性）
https://blog.logicky.com/2019/03/09/095753
参考に

その後
もしかすると、このコマンドだけでtensorflow-gpu や cudatoolkit、cudannなど GPU を使うために必要なを全てが入っちゃうみたいな記事を後々見つけた。
https://codelabo.com/posts/20200229081221

conda install -c anaconda keras-gpu
いちおうやっておく。

ひとまずGPUが動かない問題は保留。
学習はできていて安定しているので、致命的な問題になれば対策する。

現状と今後の課題

nvidia-smiでCUDA10.1が表示される。ToolKitでは10.0
cudnn7.6、7.4が混在（多分問題ない）
Driver435でOKか

参考にさせていただいたサイト

[TensorFlowからGPUが認識できているかを2行コードで確認する]
(https://thr3a.hatenablog.com/entry/20180113/1515820265)

機械学習時にGPUを認識してくれなくて、とっても困っている人向けの記事

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up