More than 1 year has passed since last update.

ポップカウント計算速度の比較[numpy]

Last updated at 2023-04-04Posted at 2023-04-04

ポップカウント計算速度の比較

numpyで、m1,m2,m3の３種類の方法でポップカウント（uintのbitの１の数）の計算速度を比較した。

実験・結果

import numpy as np
import cupy as cp
import time
import pandas as pd

# 参考　https://stackoverflow.com/questions/63954102/numpy-vectorized-way-to-count-non-zero-bits-in-array-of-integers
#@Ehsan's solution
def m1(a):
  return np.unpackbits(a.view('uint8')).sum()

#@Valdi_Bo's solution
def m2(a):
  return sum([ bin(n).count('1') for n in a ])

def m3(a):  # ※uint8しか処理できない
    return np.count_nonzero(np.unpackbits(a))

# in_ = [np.random.randint(100000,size=(n)) for n in [10,100,1000,10000,100000]]

uint8の場合

ret = []
for name,func in zip('m1,m2,m3'.split(','),[m1,m2,m3]):
    for n in [10**2,10**3,10**4,10**5,10**6]:
        arr = np.random.randint(255,size=(n), dtype=np.uint8)
        ts = time.perf_counter()
        func(arr) # 実行
        ret.append({
                'name':name,
                'count':n,
                'time':time.perf_counter()-ts,
            })
df_result = pd.DataFrame(ret)
df_result

	name	count	time
0	m1	100	0.000041
1	m1	1000	0.000032
2	m1	10000	0.000162
3	m1	100000	0.000326
4	m1	1000000	0.003844
5	m2	100	0.000058
6	m2	1000	0.000174
7	m2	10000	0.001542
8	m2	100000	0.015132
9	m2	1000000	0.162051
10	m3	100	0.000035
11	m3	1000	0.000004
12	m3	10000	0.000013
13	m3	100000	0.000109
14	m3	1000000	0.001422

df_result.pivot(index='count', columns='name', values='time').plot.line(legend=True, logx=True, logy=True)

uint32の場合

m3はuint8しか処理できなかったので実行してない

ret = []
for name,func in zip('m1,m2'.split(','),[m1,m2]):
    for n in [10**2,10**3,10**4,10**5,10**6]:
        arr = np.random.randint(2**32-1,size=(n), dtype=np.uint32)
        ts = time.perf_counter()
        func(arr) # 実行
        ret.append({
                'name':name,
                'count':n,
                'time':time.perf_counter()-ts,
            })
df_result = pd.DataFrame(ret)
df_result

	name	count	time
0	m1	100	0.000032
1	m1	1000	0.000099
2	m1	10000	0.000216
3	m1	100000	0.001684
4	m1	1000000	0.017679
5	m2	100	0.000047
6	m2	1000	0.000283
7	m2	10000	0.002625
8	m2	100000	0.026185
9	m2	1000000	0.258491

m1 10**6件は、uint8の４倍より遅くなってる。（効率悪化）

df_result.pivot(index='count', columns='name', values='time').plot.line(legend=True, logx=True, logy=True)

結論

m3でuint8を処理するのが早そう

GPUの場合

10e9はout of memoryなので実施してない

def m4(a):  # ※uint8しか処理できない
    return cp.count_nonzero(cp.unpackbits(a))

ret = []
for name,func in zip('m1,m3,m4'.split(','),[m1,m3,m4]):
    for n in [10**2,10**3,10**4,10**5,10**6,10**7,10**8]:
        # arr = np.random.randint(255,size=(n), dtype=np.uint8)
        arr = cp.random.randint(255,size=(n), dtype=np.uint8)
        ts = time.perf_counter()
        func(arr) # 実行
        ret.append({
                'name':name,
                'count':n,
                'time':time.perf_counter()-ts,
            })
df_result = pd.DataFrame(ret)
df_result

df_result.pivot(index='count', columns='name', values='time').plot.line(legend=True, logx=True, logy=True)

10e8 (100MB)でも0.1ms前後で、ほとんど差が出ない。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up