32
16

More than 3 years have passed since last update.

深層学習とかでのtensorflowエラー「tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. 」への対処

Posted at

目的

GPUを使って深層学習で学習させようとした場合に、
以下のようなエラーが出る場合がある。

 ※ 前提として、githubから取得するなど、実績のあるコードにて。

tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.

で、

前後も示すと、以下。

2019-09-21 17:13:14.228372: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 13 Chunks of size 921600 totalling 11.43MiB
2019-09-21 17:13:14.231275: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1707008 totalling 1.63MiB
2019-09-21 17:13:14.233812: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 1843200 totalling 3.52MiB
2019-09-21 17:13:14.237542: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 3145728 totalling 3.00MiB
2019-09-21 17:13:14.240348: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 10 Chunks of size 3686400 totalling 35.16MiB
2019-09-21 17:13:14.242938: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 3791360 totalling 3.62MiB
2019-09-21 17:13:14.245915: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 4194304 totalling 8.00MiB
2019-09-21 17:13:14.248532: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 5487616 totalling 5.23MiB
2019-09-21 17:13:14.252219: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 3 Chunks of size 16777216 totalling 48.00MiB
2019-09-21 17:13:14.254865: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 20971520 totalling 40.00MiB
2019-09-21 17:13:14.257753: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 17 Chunks of size 41943040 totalling 680.00MiB
2019-09-21 17:13:14.260766: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 50331648 totalling 48.00MiB
2019-09-21 17:13:14.263354: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 74894336 totalling 71.42MiB
2019-09-21 17:13:14.266044: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 23 Chunks of size 83886080 totalling 1.80GiB
2019-09-21 17:13:14.270016: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 146800640 totalling 140.00MiB
2019-09-21 17:13:14.272684: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 2.87GiB
2019-09-21 17:13:14.275331: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 3146173440 memory_limit_: 3146173644 available bytes: 204 curr_region_allocation_bytes_: 4294967296
2019-09-21 17:13:14.279840: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit:                  3146173644
InUse:                  3086549760
MaxInUse:               3086550272
NumAllocs:                     835
MaxAllocSize:           1363542016

2019-09-21 17:13:14.286537: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ********************x******************************************************************************x
2019-09-21 17:13:14.290629: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[128,32,16,160] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "error_analysis_cifar_finish.py", line 341, in <module>
    train(7)
  File "error_analysis_cifar_finish.py", line 325, in train
    callbacks=[scheduler, cb, hist], epochs=600) # cosine decayの場合は300epoch
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[128,160,32,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node conv2d_15/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[loss/mul/_561]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[128,160,32,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node conv2d_15/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

エラー対策

メモリの使用量を減らす。
具体的には、エラーがでないところまで、

batch_sizeを小さくした。

補足

環境から考えて、サイズがどうで、
実行しようとしているサイズがどうで、
だから、枯渇したとか、丁寧に考えるべきなんでしょうが、
ちょっと、時短のため、省略。

まとめ

これをみて、batch_sizeを小さくして解決する人がいれば、幸甚。

関連(本人)

pythonをストレスなく使う!(generatorに詳しくなる。since1975らしい。)
pythonをストレスなく使う!(Pythonでは、すべてがオブジェクトとして実装されている)
pythonをストレスなく使う!(Pylintに寄り添う)
pythonをストレスなく使う!(ExpressionとStatement)
英語と日本語、両方使ってPythonを丁寧に学ぶ。

今後

コメントなどあれば、お願いします。:candy:
勉強します、、、、

32
16
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
32
16