Python > 実装の変更による処理時間の比較 > .sum()の使用 > 84us to 24us ##migrated

動作環境

GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 16.04 LTS desktop amd64
TensorFlow v1.1.0
cuDNN v5.1 for Linux
CUDA v8.0
Python 3.5.2
IPython 6.0.0 -- An enhanced Interactive Python.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

概要

TensorFlowで学習したネットワークのweightとbiasを使った計算に関して、実装の違いで処理時間の比較をしている。

比較対象

    # weight
    for idx1 in range(wgt[0]):
        for idx2 in range(wgt[1]):
            conv[idx2] = conv[idx2] + src[idx1] * weight[idx1, idx2]

と

    # weight
    for idx2 in range(wgt[1]):
        tmp_vec = weight[:,idx2] * src[:]
        conv[idx2] = tmp_vec.sum()

参考: https://stackoverflow.com/questions/32316978/numpy-array-multiplication-slower-than-for-loop-with-vector-multiplication

全コード

Jupyter code.

profile_calc_conv_170722.ipynb

%%timeit

import numpy as np
import math
import sys


def calc_conv1(src, weight, bias, applyActFnc):
    wgt = weight.shape
    conv = [0.0] * bias.size

    # weight
    for idx1 in range(wgt[0]):
        for idx2 in range(wgt[1]):
            conv[idx2] = conv[idx2] + src[idx1] * weight[idx1, idx2]
    # bias
    for idx2 in range(wgt[1]):
        conv[idx2] = conv[idx2] + bias[idx2]
    # activation function
    if applyActFnc:
        for idx2 in range(wgt[1]):
            conv[idx2] = calc_sigmoid(conv[idx2])
    return conv  # return list


def calc_conv3(src, weight, bias, applyActFnc):
    wgt = weight.shape
    conv = [0.0] * bias.size

    # weight
    for idx2 in range(wgt[1]):
        tmp_vec = weight[:,idx2] * src[:]
        conv[idx2] = tmp_vec.sum()

    # bias
    for idx2 in range(wgt[1]):
        conv[idx2] = conv[idx2] + bias[idx2]
    # activation function
    if applyActFnc:
        for idx2 in range(wgt[1]):
            conv[idx2] = calc_sigmoid(conv[idx2])
    return conv  # return list


INP_NODE = 100
weight = np.random.randn(INP_NODE,2)
src = np.random.randn(INP_NODE)
bias = np.random.randn(100)
res1 = calc_conv1(src, weight, bias, applyActFnc=False)
#res3 = calc_conv3(src, weight, bias, applyActFnc=False)
#for elem in zip(res1, res3):
#    #if (elem[0] > 0.0):
#        print(elem)

結果

calc_conv1
- 84.5 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
calc_conv3
- 24.1 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

速くなった。

答えは一応同じはず。