1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

ポップカウント計算速度の比較[numpy]

Last updated at Posted at 2023-04-04

ポップカウント計算速度の比較

numpyで、m1,m2,m3の3種類の方法でポップカウント(uintのbitの1の数)の計算速度を比較した。

実験・結果

import numpy as np
import cupy as cp
import time
import pandas as pd
# 参考 https://stackoverflow.com/questions/63954102/numpy-vectorized-way-to-count-non-zero-bits-in-array-of-integers
#@Ehsan's solution
def m1(a):
  return np.unpackbits(a.view('uint8')).sum()

#@Valdi_Bo's solution
def m2(a):
  return sum([ bin(n).count('1') for n in a ])

def m3(a):  # ※uint8しか処理できない
    return np.count_nonzero(np.unpackbits(a))

# in_ = [np.random.randint(100000,size=(n)) for n in [10,100,1000,10000,100000]]

uint8の場合

ret = []
for name,func in zip('m1,m2,m3'.split(','),[m1,m2,m3]):
    for n in [10**2,10**3,10**4,10**5,10**6]:
        arr = np.random.randint(255,size=(n), dtype=np.uint8)
        ts = time.perf_counter()
        func(arr) # 実行
        ret.append({
                'name':name,
                'count':n,
                'time':time.perf_counter()-ts,
            })
df_result = pd.DataFrame(ret)
df_result
name count time
0 m1 100 0.000041
1 m1 1000 0.000032
2 m1 10000 0.000162
3 m1 100000 0.000326
4 m1 1000000 0.003844
5 m2 100 0.000058
6 m2 1000 0.000174
7 m2 10000 0.001542
8 m2 100000 0.015132
9 m2 1000000 0.162051
10 m3 100 0.000035
11 m3 1000 0.000004
12 m3 10000 0.000013
13 m3 100000 0.000109
14 m3 1000000 0.001422
df_result.pivot(index='count', columns='name', values='time').plot.line(legend=True, logx=True, logy=True)

output_5_1.png

uint32の場合

m3はuint8しか処理できなかったので実行してない

ret = []
for name,func in zip('m1,m2'.split(','),[m1,m2]):
    for n in [10**2,10**3,10**4,10**5,10**6]:
        arr = np.random.randint(2**32-1,size=(n), dtype=np.uint32)
        ts = time.perf_counter()
        func(arr) # 実行
        ret.append({
                'name':name,
                'count':n,
                'time':time.perf_counter()-ts,
            })
df_result = pd.DataFrame(ret)
df_result
name count time
0 m1 100 0.000032
1 m1 1000 0.000099
2 m1 10000 0.000216
3 m1 100000 0.001684
4 m1 1000000 0.017679
5 m2 100 0.000047
6 m2 1000 0.000283
7 m2 10000 0.002625
8 m2 100000 0.026185
9 m2 1000000 0.258491

m1 10**6件は、uint8の4倍より遅くなってる。(効率悪化)

df_result.pivot(index='count', columns='name', values='time').plot.line(legend=True, logx=True, logy=True)

output_9_1.png

結論

m3でuint8を処理するのが早そう

GPUの場合

10e9はout of memoryなので実施してない

def m4(a):  # ※uint8しか処理できない
    return cp.count_nonzero(cp.unpackbits(a))

ret = []
for name,func in zip('m1,m3,m4'.split(','),[m1,m3,m4]):
    for n in [10**2,10**3,10**4,10**5,10**6,10**7,10**8]:
        # arr = np.random.randint(255,size=(n), dtype=np.uint8)
        arr = cp.random.randint(255,size=(n), dtype=np.uint8)
        ts = time.perf_counter()
        func(arr) # 実行
        ret.append({
                'name':name,
                'count':n,
                'time':time.perf_counter()-ts,
            })
df_result = pd.DataFrame(ret)
df_result
df_result.pivot(index='count', columns='name', values='time').plot.line(legend=True, logx=True, logy=True)

image.png

10e8 (100MB)でも0.1ms前後で、ほとんど差が出ない。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?