LoginSignup
0
0

More than 5 years have passed since last update.

Ran out of memory when running ROLO

Posted at

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GT 740M
major: 3 minor: 5 memoryClockRate (GHz) 1.0325
pciBusID 0000:0a:00.0
Total memory: 1.96GiB
Free memory: 1.75GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:0a:00.0)
ROLO init
Utils init
self.cfgPath=
Default: running ROLO test.
Building ROLO graph...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:0a:00.0)
Loading complete!

('TESTING ROLO on video sequence: ', 'Human2')
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (65536): Total Chunks: 1, Chunks in use: 0 96.5KiB allocated for chunks. 96.1KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:635] Bin (268435456): Total Chunks: 1, Chunks in use: 0 513.25MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:652] Bin for 513.50MiB was 256.00MiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:658] Size: 513.25MiB | Requested Size: 0B | in_use: 0, prev: Size: 96.2KiB | Requested Size: 96.1KiB | in_use: 1, next: Size: 513.50MiB | Requested Size: 513.50MiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015c0000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015c0100 of size 65792
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015d0200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015d0300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015d0400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015d0500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015d0600 of size 65792
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015e0700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015e0800 of size 98304
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015f8800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015f8900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015f8a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015f8b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015f8c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5015f8d00 of size 65792
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x501621000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x501621100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x501621200 of size 98560
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x521778f00 of size 538445056
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x5418f9400 of size 98560
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x541911500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:670] Chunk at 0x541911600 of size 588769792
I tensorflow/core/common_runtime/bfc_allocator.cc:679] Free at 0x501608e00 of size 98816
I tensorflow/core/common_runtime/bfc_allocator.cc:679] Free at 0x501639300 of size 538180608
I tensorflow/core/common_runtime/bfc_allocator.cc:685] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:688] 14 Chunks of size 256 totalling 3.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:688] 3 Chunks of size 65792 totalling 192.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:688] 1 Chunks of size 98304 totalling 96.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:688] 2 Chunks of size 98560 totalling 192.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:688] 1 Chunks of size 538445056 totalling 513.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:688] 1 Chunks of size 588769792 totalling 561.49MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] Sum Total of in-use chunks: 1.05GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:694] Stats:
Limit: 1665990656
InUse: 1127711232
MaxInUse: 1127810048
NumAllocs: 37
MaxAllocSize: 588769792

W tensorflow/core/common_runtime/bfc_allocator.cc:270] _______________________________****************************************************************xxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 513.50MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:900] Resource exhausted: OOM when allocating tensor with shape[8204,16408]
Traceback (most recent call last):
File "./experiments/testing/ROLO_network_test_all.py", line 277, in
main(' ')
File "./experiments/testing/ROLO_network_test_all.py", line 273, in main
ROLO_TF(argvs)
File "./experiments/testing/ROLO_network_test_all.py", line 97, in init
self.ROLO(argvs)
File "./experiments/testing/ROLO_network_test_all.py", line 268, in ROLO
self.testing(x_path, y_path)
File "./experiments/testing/ROLO_network_test_all.py", line 158, in testing
sess.run(init)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 564, in run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 659, in _do_call
e.code)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[8204,16408]
[[Node: RNN/LSTMCell/W_0/Initializer/random_uniform/mul = MulT=DT_FLOAT, _class=["loc:@RNN/LSTMCell/W_0"], _device="/job:localhost/replica:0/task:0/gpu:0"]]
Caused by op u'RNN/LSTMCell/W_0/Initializer/random_uniform/mul', defined at:
File "./experiments/testing/ROLO_network_test_all.py", line 277, in
main(' ')
File "./experiments/testing/ROLO_network_test_all.py", line 273, in main
ROLO_TF(argvs)
File "./experiments/testing/ROLO_network_test_all.py", line 97, in __init
_
self.ROLO(argvs)
File "./experiments/testing/ROLO_network_test_all.py", line 237, in ROLO
self.build_networks()
File "./experiments/testing/ROLO_network_test_all.py", line 129, in build_networks
self.lstm_module = self.LSTM_single('lstm_test', self.x, self.istate, self.weights, self.biases)
File "./experiments/testing/ROLO_network_test_all.py", line 112, in LSTM_single
outputs, state = tf.nn.rnn(cell, [X[step]], state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 143, in rnn
(output, state) = call_cell()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 136, in
call_cell = lambda: cell(input
, state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 352, in call
dtype, self.num_unit_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 216, in _get_concat_variable
sharded_variable = _get_sharded_variable(name, shape, dtype, num_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 246, in _get_sharded_variable
dtype=dtype))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 339, in get_variable
collections=collections)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 262, in get_variable
collections=collections, caching_device=caching_device)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 158, in get_variable
dtype=variable_dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 209, in __init
_
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 275, in init_from_args
self._initial_value = ops.convert_to_tensor(initial_value(),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 149, in
init_val = lambda: initializer(shape.as_list(), dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py", line 200, in _initializer
dtype, seed=seed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 183, in random_uniform
return math_ops.add(rnd * (maxval - minval), minval, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 518, in binary_op_wrapper
return func(x, y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1039, in mul
return _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init
_
self._traceback = _extract_stack()

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0