Edited at

# Python > 実装の変更による処理時間の比較 > .sum()の使用 > 84us to 24us

```GeForce GTX 1070 (8GB)
ASRock Z170M Pro4S [Intel Z170chipset]
Ubuntu 16.04 LTS desktop amd64
TensorFlow v1.1.0
cuDNN v5.1 for Linux
CUDA v8.0
Python 3.5.2
IPython 6.0.0 -- An enhanced Interactive Python.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
```

### 概要

TensorFlowで学習したネットワークのweightとbiasを使った計算に関して、実装の違いで処理時間の比較をしている。

### 比較対象

```    # weight
for idx1 in range(wgt[0]):
for idx2 in range(wgt[1]):
conv[idx2] = conv[idx2] + src[idx1] * weight[idx1, idx2]
```

```    # weight
for idx2 in range(wgt[1]):
tmp_vec = weight[:,idx2] * src[:]
conv[idx2] = tmp_vec.sum()
```

### 全コード

Jupyter code.

profile_calc_conv_170722.ipynb
```%%timeit

import numpy as np
import math
import sys

def calc_conv1(src, weight, bias, applyActFnc):
wgt = weight.shape
conv = [0.0] * bias.size

# weight
for idx1 in range(wgt[0]):
for idx2 in range(wgt[1]):
conv[idx2] = conv[idx2] + src[idx1] * weight[idx1, idx2]
# bias
for idx2 in range(wgt[1]):
conv[idx2] = conv[idx2] + bias[idx2]
# activation function
if applyActFnc:
for idx2 in range(wgt[1]):
conv[idx2] = calc_sigmoid(conv[idx2])
return conv  # return list

def calc_conv3(src, weight, bias, applyActFnc):
wgt = weight.shape
conv = [0.0] * bias.size

# weight
for idx2 in range(wgt[1]):
tmp_vec = weight[:,idx2] * src[:]
conv[idx2] = tmp_vec.sum()

# bias
for idx2 in range(wgt[1]):
conv[idx2] = conv[idx2] + bias[idx2]
# activation function
if applyActFnc:
for idx2 in range(wgt[1]):
conv[idx2] = calc_sigmoid(conv[idx2])
return conv  # return list

INP_NODE = 100
weight = np.random.randn(INP_NODE,2)
src = np.random.randn(INP_NODE)
bias = np.random.randn(100)
res1 = calc_conv1(src, weight, bias, applyActFnc=False)
#res3 = calc_conv3(src, weight, bias, applyActFnc=False)
#for elem in zip(res1, res3):
#    #if (elem[0] > 0.0):
#        print(elem)
```

### 結果

• calc_conv1
• 84.5 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
• calc_conv3
• 24.1 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)