More than 1 year has passed since last update.

【Python】並行処理と、並列処理を実装してみる

Python

Posted at 2023-08-28

並行処理と、並列処理とは？

並行処理

ある時点では一つの作業しかしていないが、
特定の処理の 待ち時間 を利用して、交互に処理を進めることで同時進行しているかのようにする

並列処理

複数CPUコアが複数の処理を同時にを行うことです
作業を早く終わらせたい場合に人数を増やして早く終わらせるイメージです

並行処理と、並列処理をそれぞれどんな場面で使う？

並行処理・・・待ち時間の多い処理。I/Oバウンドな処理
- 複数APIの呼び出し
- 大量データの読み書き
- ネットワーク通信
並列処理・・・計算量が多いような処理。また、処理を分割しても影響がない処理。CPUバウンドな処理。
- 複雑なデータ検索
- ファイル変換

pythonで実際に処理を書いてみる

python3.2から追加されたconcurrent.futuresというビルドインモジュールを使用して試しに実装してみます

並行処理

まず、引数に指定された秒数だけ待って、その秒数を返すだけの関数time_funcを作成します

sample.py

import time


def time_func(t: int):
    print(f"{t} START time_func")
    time.sleep(t)
    print(f"{t} END time_func")
    return t

続いて、futuresモジュールのThreadPoolExecutorを使用して
並行処理を実装します

ThreadPoolExecutorの引数、max_workersでスレッド数を指定しています

また、Executorオブジェクトを使用して関数を実行しています

ここでは、submitを使用して
3秒待つ処理と、1秒待つ処理をそれぞれFeatureオブジェクトとして返しています

sample.py

import time
from concurrent.futures import ThreadPoolExecutor


def time_func(t: int):
    print(f"{t} START time_func")
    time.sleep(t)
    print(f"{t} END time_func")
    return t

# 以下、追加
with ThreadPoolExecutor(max_workers=2) as executor:
    start = time.perf_counter()

    futures = [executor.submit(time_func, t) for t in [3, 1]]

    result = [f.result() for f in futures]

    end = time.perf_counter()
    print(f"input: {result}")
    print(f"time: {end - start}")

実行結果が以下です　

$ python sample.py
3 START time_func
1 START time_func
1 END time_func
3 END time_func
input: [3, 1]
time: 3.0050152339972556

スレッドが2つ使用できるので、待ち時間を利用しているのが分かります

3秒待つ処理開始
待ち時間の間に1秒待つ処理開始
1秒待つ処理終了
3秒待つ処理終了

最終的にかかった時間は、3.0050152339972556なので3秒ほどでした

スレッドが1つの場合は？

先ほどと同じく、
処理が2つあるのに対して、スレッドを1つに変更してみます。

sample.py

- with ThreadPoolExecutor(max_workers=2) as executor:
+ with ThreadPoolExecutor(max_workers=1) as executor:

結界は以下です。

$ python sample1.py
3 START time_func
3 END time_func
1 START time_func
1 END time_func
input: [3, 1]
time: 4.005406697979197

スレッドが一つなので、待ち時間が利用できないので
3秒待つ処理が終わった後に、1秒待つ処理が実行されます

最終的にかかった時間は、4.005406697979197なので4秒ほどでした

スレッドが3つ以上の場合は？

先ほどと同じく、
処理が2つあるのに対して、今度はスレッドを3個また10個の場合に変更してみます。

sample.py

- with ThreadPoolExecutor(max_workers=2) as executor:
+ with ThreadPoolExecutor(max_workers=3) as executor:

$ python sample1.py
3 START time_func
1 START time_func
1 END time_func
3 END time_func
input: [3, 1]
time: 3.0037129590054974

sample.py

- with ThreadPoolExecutor(max_workers=2) as executor:
+ with ThreadPoolExecutor(max_workers=10) as executor:

$ python sample1.py
3 START time_func
1 START time_func
1 END time_func
3 END time_func
input: [3, 1]
time: 3.00476536701899

どちらも、3秒ちょいといったところで
当たり前ではありますが、処理そのものが早くなるわけではありませんでした

並列処理

関数time_funcは並行処理と時と同じにしています
並列処理では、ProcessPoolExecutorを使用しています

sample.py

import time
from concurrent.futures import ProcessPoolExecutor


def time_func(t: int):
    print(f"{t} START time_func")
    time.sleep(t)
    print(f"{t} END time_func")
    return t


def main():
    with ProcessPoolExecutor(max_workers=2) as executor:
        executor.map(time_func, [3, 1])


if __name__ == '__main__':
    start = time.perf_counter()
    main()
    end = time.perf_counter()
    print(f"time: {end - start}")

こちらも同じように実行してみます

$ python sample1.py
3 START time_func
1 START time_func
1 END time_func
3 END time_func
time: 3.1721984889591113

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up