More than 5 years have passed since last update.

TensorFlow > TFRecords > mean, stddev計算 (Python list使用)

Last updated at 2017-10-03Posted at 2017-08-26

動作環境

GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 16.04 LTS desktop amd64
TensorFlow v1.2.1
cuDNN v5.1 for Linux
CUDA v8.0
Python 3.5.2
IPython 6.0.0 -- An enhanced Interactive Python.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

4,113,648レコードのTFRecordsファイルを読込んで(mean, stddev)を計算しようとしている。

numpy.append()を各レコードごとに実行すると遅いということがわかった。
https://stackoverflow.com/questions/7133885/fastest-way-to-grow-a-numpy-numeric-array

Python listに追加して、その結果を用いて計算するように変更した。

calc_mean_std_170826b.py

import numpy as np
import tensorflow as tf
import sys
import time

"""
v0.1 Aug. 26, 2017
  - use [list] instead of [numpy.array] to append data
     + numpy.array.append() causes a bottle-neck

=== branched from [calc_mean_std_170819.py] ===

v0.1 Aug. 19, 2017
  - add time profiling
  - change [INP_FILE] to those with _170819 prefix

=== branched from [calc_mean_std_170812.py] ===

v0.1 Aug. 12, 2017
  - calculate [mean], [stddev]

=== branched from [test_readCombined_170722.py] ===
v0.2 Jul. 09, 2017
  - read [mr] and [mi]
v0.1 Jul. 09, 2017
  - read position and Ex, Ey, Ez
     + add get_feature_float32()
"""

# on
#   Ubuntu 16.04 LTS
#   TensorFlow v1.1
#   Python 3.5.2

# codingrule: PEP8


def print_mean_stddev(xs, label):
    print('%s mean:%f std:%f' % (label, xs.mean(), xs.std()))


def get_feature_float32(example, feature_name):
    wrk_raw = (example.features.feature[feature_name]
               .bytes_list
               .value[0])
    wrk_1d = np.fromstring(wrk_raw, dtype=np.float32)
    wrk_org = wrk_1d.reshape([1, -1])
    return wrk_org

INP_FILE = 'combined_IntField-Y_170819.tfrecords'

record_iterator = tf.python_io.tf_record_iterator(path=INP_FILE)

exrs, exis = [], []
eyrs, eyis = [], []
ezrs, ezis = [], []

cnt = 0
start = time.time()
for record in record_iterator:
    if cnt % 10000 == 0:
        print("%d, %.3f" % (cnt, time.time() - start))
        start = time.time()
    example = tf.train.Example()
    example.ParseFromString(record)

    xpos_org = get_feature_float32(example, 'xpos_raw')
    ypos_org = get_feature_float32(example, 'ypos_raw')
    zpos_org = get_feature_float32(example, 'zpos_raw')
    mr_org = get_feature_float32(example, 'mr_raw')
    mi_org = get_feature_float32(example, 'mi_raw')
    exr_org = get_feature_float32(example, 'exr_raw')
    exi_org = get_feature_float32(example, 'exi_raw')
    eyr_org = get_feature_float32(example, 'eyr_raw')
    eyi_org = get_feature_float32(example, 'eyi_raw')
    ezr_org = get_feature_float32(example, 'ezr_raw')
    ezi_org = get_feature_float32(example, 'ezi_raw')

    exrs.append(exr_org)
    exis.append(exi_org)
    eyrs.append(eyr_org)
    eyis.append(eyi_org)
    ezrs.append(ezr_org)
    ezis.append(ezi_org)

    cnt += 1

print_mean_stddev(np.array(exrs), 'exr')
print_mean_stddev(np.array(exis), 'exi')

print_mean_stddev(np.array(eyrs), 'eyr')
print_mean_stddev(np.array(eyis), 'eyi')

print_mean_stddev(np.array(ezrs), 'ezr')
print_mean_stddev(np.array(ezis), 'ezi')

run

$ python3 calc_mean_std_170826b.py 
0, 0.000
10000, 0.629
20000, 0.633
30000, 0.629
40000, 0.628
50000, 0.629
60000, 0.630
70000, 0.629
80000, 0.636
...

定数時間で処理できるようになった。

4,113,648レコードに対して249秒で処理が完了した。

run

exr mean:0.000000 std:0.135288
exi mean:0.000000 std:0.108794
eyr mean:-0.020458 std:0.803743
eyi mean:0.015096 std:0.617177
ezr mean:-0.000000 std:0.201634
ezi mean:-0.000000 std:0.325543

meanが0.0なのは合っているのか。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up