More than 5 years have passed since last update.

TensorFlow > TFRecords > Queue and Threadsでの読込み > 読込みを進めるに連れ、読込み時間が増加する > 犯人はnp.append()

Last updated at 2018-09-17Posted at 2017-08-26

動作環境

GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 16.04 LTS desktop amd64
TensorFlow v1.2.1
cuDNN v5.1 for Linux
CUDA v8.0
Python 3.5.2
IPython 6.0.0 -- An enhanced Interactive Python.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

TensorFlow > TFRecords > tf_record_iterator()での読込みを進めるに連れ、読み込み時間が増加するようだ
においてtf.python_io.tf_record_iterator()を使用した。

TFRecords読込みに関しては別の方法としてQueue and Threadがあるようなので、実装してみた。

calc_mean_std_170826.py

import numpy as np
import tensorflow as tf
import sys
import time

"""
v0.1 Aug. 26, 2017
  - use Queue and Thread for reading

=== branched from [calc_mean_std_170819.py] ===
v0.1 Aug. 19, 2017
  - add time profiling
  - change [INP_FILE] to those with _170819 prefix

=== branched from [calc_mean_std_170812.py] ===

v0.1 Aug. 12, 2017
  - calculate [mean], [stddev]

=== branched from [test_readCombined_170722.py] ===
v0.2 Jul. 09, 2017
  - read [mr] and [mi]
v0.1 Jul. 09, 2017
  - read position and Ex, Ey, Ez
     + add get_feature_float32()
"""

# on
#   Ubuntu 16.04 LTS
#   TensorFlow v1.2.1
#   Python 3.5.2

# codingrule: PEP8


def print_mean_stddev(xs, label):
    print('%s mean:%f std:%f' % (label, xs.mean(), xs.std()))


INP_FILE = 'combined_IntField-Y_170819.tfrecords'


def read_with_queue(filename_queue):
    reader = tf.TFRecordReader()
    _, serialzed_example = reader.read(filename_queue)
    features = tf.parse_single_example(
        serialzed_example,
        features={
            'xpos_raw': tf.FixedLenFeature([], tf.string),
            'ypos_raw': tf.FixedLenFeature([], tf.string),
            'zpos_raw': tf.FixedLenFeature([], tf.string),
            'mr_raw': tf.FixedLenFeature([], tf.string),
            'mi_raw': tf.FixedLenFeature([], tf.string),
            'exr_raw': tf.FixedLenFeature([], tf.string),
            'exi_raw': tf.FixedLenFeature([], tf.string),
            'eyr_raw': tf.FixedLenFeature([], tf.string),
            'eyi_raw': tf.FixedLenFeature([], tf.string),
            'ezr_raw': tf.FixedLenFeature([], tf.string),
            'ezi_raw': tf.FixedLenFeature([], tf.string)
        })

    xpos_raw = tf.decode_raw(features['xpos_raw'], tf.float32)
    ypos_raw = tf.decode_raw(features['ypos_raw'], tf.float32)
    zpos_raw = tf.decode_raw(features['zpos_raw'], tf.float32)
    mr_raw = tf.decode_raw(features['mr_raw'], tf.float32)
    mi_raw = tf.decode_raw(features['mi_raw'], tf.float32)
    exr_raw = tf.decode_raw(features['exr_raw'], tf.float32)
    exi_raw = tf.decode_raw(features['exi_raw'], tf.float32)
    eyr_raw = tf.decode_raw(features['eyr_raw'], tf.float32)
    eyi_raw = tf.decode_raw(features['eyi_raw'], tf.float32)
    ezr_raw = tf.decode_raw(features['ezr_raw'], tf.float32)
    ezi_raw = tf.decode_raw(features['ezi_raw'], tf.float32)
    axpos = tf.reshape(xpos_raw, [1, -1])[0]
    aypos = tf.reshape(ypos_raw, [1, -1])[0]
    azpos = tf.reshape(zpos_raw, [1, -1])[0]
    amr = tf.reshape(mr_raw, [1, -1])[0]
    ami = tf.reshape(mi_raw, [1, -1])[0]
    aexr = tf.reshape(exr_raw, [1, -1])[0]
    aexi = tf.reshape(exi_raw, [1, -1])[0]
    aeyr = tf.reshape(eyr_raw, [1, -1])[0]
    aeyi = tf.reshape(eyi_raw, [1, -1])[0]
    aezr = tf.reshape(ezr_raw, [1, -1])[0]
    aezi = tf.reshape(ezi_raw, [1, -1])[0]

    res = (axpos, aypos, axpos, amr, ami)
    res += (aexr, aexi, aeyr, aeyi, aezr, aezi)
    return res

# record_iterator = tf.python_io.tf_record_iterator(path=INP_FILE)

exrs, exis = np.array([]), np.array([])
eyrs, eyis = np.array([]), np.array([])
ezrs, ezis = np.array([]), np.array([])

cnt = 0
start = time.time()

filename_queue = tf.train.string_input_producer([INP_FILE])
readvals = read_with_queue(filename_queue)

exrs, exis = np.array([]), np.array([])
eyrs, eyis = np.array([]), np.array([])
ezrs, ezis = np.array([]), np.array([])

start = time.time()
with tf.Session() as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    for loop in range(4113648):  # 4113648
        if cnt % 10000 == 0:
            print("%d, %.3f" % (cnt, time.time() - start))
            start = time.time()
        cnt += 1
        try:
            res = sess.run(readvals)
            xpos, ypos, zpos, mr, mi = res[:5]
            exr, exi, eyr, eyi, ezr, ezi = res[5:]
            exrs, exis = np.append(exrs, exr), np.append(exis, exi)
            eyrs, eyis = np.append(eyrs, eyr), np.append(eyis, eyi)
            ezrs, ezis = np.append(ezrs, ezr), np.append(ezis, ezi)

            # print(xpos, ypos, zpos, mr, mi, exr, exi, eyr, eyi, ezr, ezi)
        finally:
            coord.request_stop()
            coord.join(threads)

print_mean_stddev(exrs, 'exr')
print_mean_stddev(exis, 'exi')

print_mean_stddev(eyrs, 'eyr')
print_mean_stddev(eyis, 'eyi')

print_mean_stddev(ezrs, 'ezr')
print_mean_stddev(ezis, 'ezi')

run

$ python3 calc_mean_std_170826.py 
2017-08-26 09:39:02.005639: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:39:02.005660: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:39:02.005666: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:39:02.005686: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:39:02.005691: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
0, 0.005
10000, 2.727
20000, 2.875
30000, 3.017
40000, 3.216
50000, 4.435
60000, 12.200
70000, 14.021

処理が進むに連れ、やはり時間がかかるようになっている。

犯人はnp.append()

上記のコードにおいてnp.append()をコメントアウトしてみた。
(I have a hunch that ...)

run

$ python3 calc_mean_std_170826.py 
2017-08-26 09:40:35.268727: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:40:35.268747: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:40:35.268766: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:40:35.268771: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-26 09:40:35.268776: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
0, 0.003
10000, 2.328
20000, 2.312
30000, 2.302
40000, 2.299
50000, 2.301
60000, 2.315
70000, 2.319

定数時間で処理されるようになった。
犯人はnp.append()

同様に
http://qiita.com/7of9/items/833268e83a8e449ce2bb
のコードでもnp.append()をコメントアウトしてみた。

run

$ python3 calc_mean_std_170819.py 
0, 0.000
10000, 0.569
20000, 0.572
30000, 0.571
40000, 0.579
50000, 0.571
60000, 0.577
70000, 0.585

犯人はnp.append()。

似たような事例

(2018-09-17)

似たような記事を見つけたのでリンクします。

Numpyの配列でappendを使うと遅くなる

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up