More than 5 years have passed since last update.

深層学習とかでのCUDAエラー「Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error」への対処

Last updated at 2020-03-08Posted at 2019-11-18

目的

GPUを使って深層学習で学習させようとした場合に、
以下のようなエラーが出る場合がある。

2019-11-18 04:16:42.405806: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

このエラーの原因をネットで検索しても、
あまり良い情報がない。
単に、英語とか中国語が理解できていないだけかもしれないが。

自分なりに理解できたことを示す。

参考程度に環境を示す

tensorflow           1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu       1.14.0

エラー対策

tensorflowのバージョンとかの可能性もあるのかもしれないが、
ひとつのケースとして、
単なるメモリ不足（この場合、GPUのメモリではなく、CPUのメモリ）
で、このエラーが
出ることを確認している。

もし、
CPUのメモリの使用量を減らすことができるなら、
試してみて下さい。

ちなみに、
このエラーがどういうエラーであるかは、全く、理解できない。
（たぶん、理解できるようなエラーではないと、想像します。）

別のエラー（全く、未解決。）

以下のようなエラーが出る場合もある。
そもそも、エラーメッセージの意味がうまく理解できない。
ネットで調べても、参考になる情報がない。

エラー抜粋

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 2.90 GiB already allocated; 30.80 MiB free; 9.54 MiB cached)

全体

D:\_mish1\Mish-master\Mish-master\Examples and Benchmarks>python _res50_1.py
Files already downloaded and verified
Files already downloaded and verified
Traceback (most recent call last):
  File "_res50_1.py", line 329, in <module>
    logps = model.forward(inputs)
  File "_res50_1.py", line 242, in forward
    x = self.conv2(x)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "_res50_1.py", line 208, in forward
    return f_mish(self.split_transforms(x) + self.shortcut(x))
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\functional.py", line 1656, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 2.90 GiB already allocated; 30.80 MiB free; 9.54 MiB cached)

まとめ

これをみて、問題解決する人がいれば、幸甚。

今後

コメントなどあれば、お願いします。
勉強します、

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

深層学習とかでのCUDAエラー「Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error」への対処

目的

エラー対策

別のエラー（全く、未解決。）

まとめ

関連（本人）

今後