動作環境
GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 16.04 LTS desktop amd64
TensorFlow v1.2.1
cuDNN v5.1 for Linux
CUDA v8.0
Python 3.5.2
IPython 6.0.0 -- An enhanced Interactive Python.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
関連: TensorFlow > TFRecords > tf_record_iterator()での読込みを進めるに連れ、読み込み時間が増加するようだ
関連: TensorFlow > TFRecords > Queue and Threadsでの読込み > 読込みを進めるに連れ、読込み時間が増加する > 犯人はnp.append()
4,113,648レコードのTFRecordsファイルを読込んで(mean, stddev)を計算しようとしている。
numpy.append()を各レコードごとに実行すると遅いということがわかった。
https://stackoverflow.com/questions/7133885/fastest-way-to-grow-a-numpy-numeric-array
Python listに追加して、その結果を用いて計算するように変更した。
calc_mean_std_170826b.py
import numpy as np
import tensorflow as tf
import sys
import time
"""
v0.1 Aug. 26, 2017
- use [list] instead of [numpy.array] to append data
+ numpy.array.append() causes a bottle-neck
=== branched from [calc_mean_std_170819.py] ===
v0.1 Aug. 19, 2017
- add time profiling
- change [INP_FILE] to those with _170819 prefix
=== branched from [calc_mean_std_170812.py] ===
v0.1 Aug. 12, 2017
- calculate [mean], [stddev]
=== branched from [test_readCombined_170722.py] ===
v0.2 Jul. 09, 2017
- read [mr] and [mi]
v0.1 Jul. 09, 2017
- read position and Ex, Ey, Ez
+ add get_feature_float32()
"""
# on
# Ubuntu 16.04 LTS
# TensorFlow v1.1
# Python 3.5.2
# codingrule: PEP8
def print_mean_stddev(xs, label):
print('%s mean:%f std:%f' % (label, xs.mean(), xs.std()))
def get_feature_float32(example, feature_name):
wrk_raw = (example.features.feature[feature_name]
.bytes_list
.value[0])
wrk_1d = np.fromstring(wrk_raw, dtype=np.float32)
wrk_org = wrk_1d.reshape([1, -1])
return wrk_org
INP_FILE = 'combined_IntField-Y_170819.tfrecords'
record_iterator = tf.python_io.tf_record_iterator(path=INP_FILE)
exrs, exis = [], []
eyrs, eyis = [], []
ezrs, ezis = [], []
cnt = 0
start = time.time()
for record in record_iterator:
if cnt % 10000 == 0:
print("%d, %.3f" % (cnt, time.time() - start))
start = time.time()
example = tf.train.Example()
example.ParseFromString(record)
xpos_org = get_feature_float32(example, 'xpos_raw')
ypos_org = get_feature_float32(example, 'ypos_raw')
zpos_org = get_feature_float32(example, 'zpos_raw')
mr_org = get_feature_float32(example, 'mr_raw')
mi_org = get_feature_float32(example, 'mi_raw')
exr_org = get_feature_float32(example, 'exr_raw')
exi_org = get_feature_float32(example, 'exi_raw')
eyr_org = get_feature_float32(example, 'eyr_raw')
eyi_org = get_feature_float32(example, 'eyi_raw')
ezr_org = get_feature_float32(example, 'ezr_raw')
ezi_org = get_feature_float32(example, 'ezi_raw')
exrs.append(exr_org)
exis.append(exi_org)
eyrs.append(eyr_org)
eyis.append(eyi_org)
ezrs.append(ezr_org)
ezis.append(ezi_org)
cnt += 1
print_mean_stddev(np.array(exrs), 'exr')
print_mean_stddev(np.array(exis), 'exi')
print_mean_stddev(np.array(eyrs), 'eyr')
print_mean_stddev(np.array(eyis), 'eyi')
print_mean_stddev(np.array(ezrs), 'ezr')
print_mean_stddev(np.array(ezis), 'ezi')
run
$ python3 calc_mean_std_170826b.py
0, 0.000
10000, 0.629
20000, 0.633
30000, 0.629
40000, 0.628
50000, 0.629
60000, 0.630
70000, 0.629
80000, 0.636
...
定数時間で処理できるようになった。
4,113,648レコードに対して249秒で処理が完了した。
run
exr mean:0.000000 std:0.135288
exi mean:0.000000 std:0.108794
eyr mean:-0.020458 std:0.803743
eyi mean:0.015096 std:0.617177
ezr mean:-0.000000 std:0.201634
ezi mean:-0.000000 std:0.325543
meanが0.0なのは合っているのか。