More than 5 years have passed since last update.

深層学習とかでのtensorflowエラー「tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. 」への対処

Posted at 2019-09-21

目的

GPUを使って深層学習で学習させようとした場合に、
以下のようなエラーが出る場合がある。

　※　前提として、githubから取得するなど、実績のあるコードにて。

tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.

で、

前後も示すと、以下。

2019-09-21 17:13:14.228372: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 13 Chunks of size 921600 totalling 11.43MiB
2019-09-21 17:13:14.231275: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1707008 totalling 1.63MiB
2019-09-21 17:13:14.233812: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 1843200 totalling 3.52MiB
2019-09-21 17:13:14.237542: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 3145728 totalling 3.00MiB
2019-09-21 17:13:14.240348: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 10 Chunks of size 3686400 totalling 35.16MiB
2019-09-21 17:13:14.242938: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 3791360 totalling 3.62MiB
2019-09-21 17:13:14.245915: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 4194304 totalling 8.00MiB
2019-09-21 17:13:14.248532: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 5487616 totalling 5.23MiB
2019-09-21 17:13:14.252219: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 3 Chunks of size 16777216 totalling 48.00MiB
2019-09-21 17:13:14.254865: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 20971520 totalling 40.00MiB
2019-09-21 17:13:14.257753: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 17 Chunks of size 41943040 totalling 680.00MiB
2019-09-21 17:13:14.260766: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 50331648 totalling 48.00MiB
2019-09-21 17:13:14.263354: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 74894336 totalling 71.42MiB
2019-09-21 17:13:14.266044: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 23 Chunks of size 83886080 totalling 1.80GiB
2019-09-21 17:13:14.270016: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 146800640 totalling 140.00MiB
2019-09-21 17:13:14.272684: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 2.87GiB
2019-09-21 17:13:14.275331: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 3146173440 memory_limit_: 3146173644 available bytes: 204 curr_region_allocation_bytes_: 4294967296
2019-09-21 17:13:14.279840: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit:                  3146173644
InUse:                  3086549760
MaxInUse:               3086550272
NumAllocs:                     835
MaxAllocSize:           1363542016

2019-09-21 17:13:14.286537: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ********************x******************************************************************************x
2019-09-21 17:13:14.290629: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[128,32,16,160] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "error_analysis_cifar_finish.py", line 341, in <module>
    train(7)
  File "error_analysis_cifar_finish.py", line 325, in train
    callbacks=[scheduler, cb, hist], epochs=600) # cosine decayの場合は300epoch
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[128,160,32,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node conv2d_15/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[loss/mul/_561]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[128,160,32,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node conv2d_15/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

エラー対策

メモリの使用量を減らす。
具体的には、エラーがでないところまで、

batch_sizeを小さくした。

補足

環境から考えて、サイズがどうで、
実行しようとしているサイズがどうで、
だから、枯渇したとか、丁寧に考えるべきなんでしょうが、
ちょっと、時短のため、省略。

まとめ

これをみて、batch_sizeを小さくして解決する人がいれば、幸甚。

今後

コメントなどあれば、お願いします。
勉強します、、、、

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

深層学習とかでのtensorflowエラー「tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. 」への対処

目的

エラー対策

補足

まとめ

関連（本人）

今後