LoginSignup
0
1

More than 5 years have passed since last update.

Python > 実装の変更による処理時間の比較 > .sum()の使用 > 84us to 24us

Last updated at Posted at 2017-07-22
動作環境
GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 16.04 LTS desktop amd64
TensorFlow v1.1.0
cuDNN v5.1 for Linux
CUDA v8.0
Python 3.5.2
IPython 6.0.0 -- An enhanced Interactive Python.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

概要

TensorFlowで学習したネットワークのweightとbiasを使った計算に関して、実装の違いで処理時間の比較をしている。

関連: http://qiita.com/7of9/items/f267e59790526e49a7fd

比較対象

    # weight
    for idx1 in range(wgt[0]):
        for idx2 in range(wgt[1]):
            conv[idx2] = conv[idx2] + src[idx1] * weight[idx1, idx2]

    # weight
    for idx2 in range(wgt[1]):
        tmp_vec = weight[:,idx2] * src[:]
        conv[idx2] = tmp_vec.sum()

参考: https://stackoverflow.com/questions/32316978/numpy-array-multiplication-slower-than-for-loop-with-vector-multiplication

全コード

Jupyter code.

profile_calc_conv_170722.ipynb
%%timeit

import numpy as np
import math
import sys


def calc_conv1(src, weight, bias, applyActFnc):
    wgt = weight.shape
    conv = [0.0] * bias.size

    # weight
    for idx1 in range(wgt[0]):
        for idx2 in range(wgt[1]):
            conv[idx2] = conv[idx2] + src[idx1] * weight[idx1, idx2]
    # bias
    for idx2 in range(wgt[1]):
        conv[idx2] = conv[idx2] + bias[idx2]
    # activation function
    if applyActFnc:
        for idx2 in range(wgt[1]):
            conv[idx2] = calc_sigmoid(conv[idx2])
    return conv  # return list


def calc_conv3(src, weight, bias, applyActFnc):
    wgt = weight.shape
    conv = [0.0] * bias.size

    # weight
    for idx2 in range(wgt[1]):
        tmp_vec = weight[:,idx2] * src[:]
        conv[idx2] = tmp_vec.sum()

    # bias
    for idx2 in range(wgt[1]):
        conv[idx2] = conv[idx2] + bias[idx2]
    # activation function
    if applyActFnc:
        for idx2 in range(wgt[1]):
            conv[idx2] = calc_sigmoid(conv[idx2])
    return conv  # return list


INP_NODE = 100
weight = np.random.randn(INP_NODE,2)
src = np.random.randn(INP_NODE)
bias = np.random.randn(100)
res1 = calc_conv1(src, weight, bias, applyActFnc=False)
#res3 = calc_conv3(src, weight, bias, applyActFnc=False)
#for elem in zip(res1, res3):
#    #if (elem[0] > 0.0):
#        print(elem)

結果

  • calc_conv1
    • 84.5 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  • calc_conv3
    • 24.1 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

速くなった。

答えは一応同じはず。

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1