0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

深層学習。GPUでのソフトエラーの現場をおさえたー!

Posted at

GPUでのソフトエラーの現場をおさえたー!

深層学習を実行する場合に、NVIDIA GPUのGEFORCEを使うことがあると思います。
GEFROCEは、ECCがついていません。

でたー、ソフトエラー。

↓ これがそうである可能性は1%ぐらいかもしれませんが。。。どうでしょう。

Traceback (most recent call last):
File "", line 1, in
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
MemoryError
Traceback (most recent call last):
File "train_cifar10.py", line 213, in
trainloss = train(epoch)
File "train_cifar10.py", line 144, in train
for batch_idx, (inputs, targets) in enumerate(trainloader):
File "C:\Users\sXYZZ\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 291, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\sXYZZ\AppData\Roaming\Python\Python37\site-packages\torch\utils\data\dataloader.py", line 737, in init
w.start()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

コメント

もしも、エラーが起こるかもしれない環境で、深層学習を動かしているとしたら、、、かなり違和感があります。
どうなんでしょうか。
のべ1万人の一生で一度あるかないかとかの率だったら気にしませんが。。。

まとめ

情報などあればお願いします。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?