1
0

More than 5 years have passed since last update.

np.choiceとchainer.utils.WalkerAliasの速度対決

Posted at

chainerのマニュアルに「chainer.utils.WalkerAliasはnp.choiceよりも速い」と書いてあったので,速度を比較してみた.ただしCPUオンリー.

コード

乱数生成

import numpy as np
import chainer
from time import time
import matplotlib.pyplot as plt
%matplotlib inline

loops = 10  # for taking average
verc_size = 10 ** np.arange(1, 7)  # i
n_samples = 10 ** np.arange(1, 7)  # j

t_choice = np.zeros((len(verc_size), len(n_samples)))
t_alias  = np.zeros((len(verc_size), len(n_samples)))
t_aliasv = np.zeros((len(verc_size), len(n_samples)))



for i in range(len(verc_size)):

    p = np.random.randint(100, size=verc_size[i])
    p = p / p.sum()
    # p = p.astype(np.float32)  # for gpu?
    c = np.random.randint(100, size=verc_size[i])

    start_time = time()
    for loop in range(loops):
        sampler = chainer.utils.WalkerAlias(p)
    end_time = time()
    e_time = end_time - start_time
    t_aliasv[i, :] = e_time / loops


    for j in range(len(n_samples)):

        # numpy choice
        start_time = time()
        for loop in range(loops):
            res = np.random.choice(c, size=n_samples[j], p=p)
        end_time = time()
        e_time = end_time - start_time
        print('choice', e_time, n_samples[j], verc_size[i])
        t_choice[i, j] = e_time / loops

        # chainer alisas
        start_time = time()
        for loop in range(loops):
            res = sampler.sample(n_samples[j])
            res = c[res]
        end_time = time()
        e_time = end_time - start_time
        print('alias ', e_time, n_samples[j], verc_size[i])
        t_alias[i, j] = e_time / loopsimport numpy as np

プロット

プロット
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))

def myax(ax, title):
    ax.set_yscale('log')
    ax.set_xscale('log')
    ax.set_ylim(1e-4, 1e2)
    ax.set_title(title)
    ax.set_xlabel('# samples')
    ax.set_ylabel('time [s]')

for i in range(len(verc_size)):
    ax[0].plot(n_samples, t_choice[i], label=verc_size[i])
myax(ax[0], 'np.choice')
ax[0].legend(loc='best')

ax[1].plot(n_samples, t_alias)
myax(ax[1], 'chainer.alias sampling only')

ax[2].plot(n_samples, t_alias + t_aliasv)
myax(ax[2], 'chainer.alias init + sampling')

結果

  • alias methodは確かにO(1)ということが分かる(sampling onlyの場合.計算時間はサンプル数に依存しない).
  • alias methodは初期化に膨大な時間がかかるので,初期化の時間を含めると確実にnp.choiceのほうが速い.
  • 初期化を除くと,「1万個の中から1つ選ぶ」を1万回繰り返すときには,alias methodのほうが速そう.
  • GPUでalias methodがどの程度速いのかは未検証(chainer.utils.WalkerAliasにはto_gpu()がある).

choice.png

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0