14
15

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Numba 並列化オプションの効果の計測結果

Posted at

TL;DR

  • numba の並列化オプションについて実行速度を調査 (Numba で並列処理ができることを知ったので - Qiita を読んだので)
  • 比較対象
    • no numba
    • @jit
    • @jit(nopython=True)
    • @jit(nopython=True, parallel=True)
    • @jit(nopython=True, parallel=True) + numba.prange
    • @njit
    • @njit(parallel=True)
    • @njit(parallel=True) + numba.prange
  • とりあえず @njit(parallel=True) にしとけば良さそう
  • ループ数が多いときに numba.prange も併用すると良さげ
  • 処理内容で効果は違うと思うので、計測や**公式ドキュメントの確認**は忘れずに

検証環境

  • MacBookPro(2016)
  • Jupyter Notebook
$ system_profiler SPHardwareDataType
Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro13,3
      Processor Name: Intel Core i7
      Processor Speed: 2.7 GHz
      Number of Processors: 1
      Total Number of Cores: 4
      L2 Cache (per Core): 256 KB
      L3 Cache: 8 MB
      Memory: 16 GB
      Boot ROM Version: 250.0.0.0.0
      SMC Version (system): 2.38f7

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.14
BuildVersion:   18A391

$ pyenv --version
pyenv 1.1.5

$ python --version
Python 3.6.2 :: Anaconda, Inc.

 (master *%<>) $ pip show jupyter numba
Name: jupyter
Version: 1.0.0

Name: numba
Version: 0.40.1

検証

Numba で並列処理ができることを知ったので - Qiita と同じコードを用いる

検証用コード

関数定義

import numba
from numba import jit, njit, prange
import random

def calc_pi(NUM):
    counter = 0
    for i in range(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@jit
def calc_pi_jit(NUM):
    counter = 0
    for i in range(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@jit(nopython=True)
def calc_pi_jit_nopython(NUM):
    counter = 0
    for i in range(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@jit(nopython=True, parallel=True)
def calc_pi_jit_parallel(NUM):
    counter = 0
    for i in range(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@jit(nopython=True, parallel=True)
def calc_pi_jit_prange(NUM):
    counter = 0
    for i in prange(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@njit
def calc_pi_njit(NUM):
    counter = 0
    for i in range(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@njit(parallel=True)
def calc_pi_njit_parallel(NUM):
    counter = 0
    for i in range(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

@njit(parallel=True)
def calc_pi_njit_prange(NUM):
    counter = 0
    for i in prange(NUM):
        x = random.random()
        y = random.random()
        if x*x+y*y < 1.0:
            counter += 1
    pi = 4.0*counter/NUM
    return pi

計測部

numba なしは計測に時間がかかり過ぎるので 1,000,000 まで

for i in range(4):
    num = pow(1000, i)
    print(f'{"="*10} num={num} {"="*10}')
    if i < 3:
        print("no numba")
        %timeit calc_pi(num)
    print("jit only")
    %timeit calc_pi_jit(num)
    print("jit nopython")
    %timeit calc_pi_jit_nopython(num)
    print("jit parallel")
    %timeit calc_pi_jit_parallel(num)
    print("jit prange")
    %timeit calc_pi_jit_prange(num)
    print("njit only")
    %timeit calc_pi_njit(num)
    print("njit parallel")
    %timeit calc_pi_njit_parallel(num)
    print("njit prange")
    %timeit calc_pi_njit_prange(num)

検証結果

テーブルにまとめるの面倒なので、Jupyter Notebookの結果をそのまま貼り付け。

  • parallel=True は特に弊害無さそうなので、とりあえずで付けてても良さそう
  • numba.prange はループ数(並列数)が少ない場合は逆効果(オーバーヘッド増)
    • ループ数大 : numba.prange の恩恵大
    • ループ数小 : numba.prange が悪影響
========== num=1 ==========
no numba
729 ns ± 44.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
jit only
246 ns ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
jit nopython
211 ns ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
jit parallel
203 ns ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
jit prange
43.7 µs ± 2.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
njit only # 最速
196 ns ± 4.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
njit parallel
209 ns ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
njit prange
55.8 µs ± 9.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

========== num=1000 ==========
no numba
344 µs ± 56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
jit only
17.7 µs ± 562 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
jit nopython # 最速
16.6 µs ± 875 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
jit parallel
16.6 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
jit prange
55.4 µs ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
njit only
17.3 µs ± 983 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
njit parallel
16.7 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
njit prange
52.3 µs ± 4.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

========== num=1000000 ==========
no numba
300 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
jit only
16.8 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
jit nopython
17.1 ms ± 953 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
jit parallel
17 ms ± 995 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
jit prange
5.33 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
njit only
16.4 ms ± 803 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
njit parallel
16.2 ms ± 519 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
njit prange # 最速
4.96 ms ± 565 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

========== num=1000000000 ==========
jit only
16.5 s ± 481 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
jit nopython
16.2 s ± 385 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
jit parallel
16.4 s ± 876 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
jit prange
4.58 s ± 200 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
njit only
16.8 s ± 995 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
njit parallel
16.4 s ± 702 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
njit prange # 最速
4.29 s ± 51.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

おわりに

処理内容で効果は違うと思うので、計測や**公式ドキュメントの確認**は忘れずに

参考

14
15
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
14
15

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?