More than 5 years have passed since last update.

pythonで並列処理

Python

Posted at 2018-10-22

並列処理したい

bigdata系の処理をしていると、処理が遅すぎて泣きそうになることが多々。
inのファイルを分割して同じファイルを呼んでもいいけど、もっとスマートにやりたい。

シングルプロセスで二重ループ処理をやってみる。

{calcurate_example.py}

import time
from multiprocessing import Pool
from multiprocessing import Process

start = time.time()

L = 10000

total = 0
for i in range(L):
    for j in range(L):
        total += i*j

print (total)

elapsed_time = time.time() - start
print ("elapsed_time:{0}".format(elapsed_time)) + "[sec]"

# result
>2499500025000000
>elapsed_time:23.1847140789[sec]

23秒ほどかかってる。

これをマルチプロセスにしてみる

import time
import multiprocessing as mp


# 各プロセスが実行する計算
def subcalc(p): # p = 0,1
    subtotal = 0

    # iの範囲を設定
    ini = L * p / proc
    fin = L * (p+1) / proc

    # 計算を実行
    for i in range(ini, fin):
        for j in range(L):
            subtotal += i * j
    return subtotal

if __name__ == '__main__':
    start = time.time()
    L = 10000
    proc = 2               # 並列プロセス数
    pool = mp.Pool(proc)

    # 各プロセスに subcalc(p) を実行させる
    # ここで p = 0,1
    # callbackには各戻り値がlistとして格納される
    callback = pool.map(subcalc, range(proc))

    # 各戻り値の総和を計算
    total = sum(callback)

    print (total)

    elapsed_time = time.time() - start
    print ("elapsed_time:{0}".format(elapsed_time)) + "[sec]"

> 2499500025000000
> elapsed_time:6.75670814514[sec]

6.7秒。　めっちゃ早くなった！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up