More than 3 years have passed since last update.

Lambda＋Pythonでマルチプロセス化による高速化検証(キューイング処理を自前で作る)

Last updated at 2022-01-13Posted at 2021-12-29

Lambdaの処理を高速化しようと思ったのでマルチプロセス化を試した結果を残しておきます。
pythonのmultiprocessingを使う場合、LambdaではPool／Queueが使えないという問題を回避しているので、困っている方の参考になればと思います。

前提

高速化対象の処理はブロッキングが発生する処理
⇒asyncio等でマルチスレッディングしても効果がない処理
⇒検証ではシンプルに1000万回ループして変数にインデックスをセットするだけという処理を10回実行

比較結果一覧

Lambdaはメモリに応じてvCPUの数が増減し、メモリ10GBで6つまで拡張可能となっているため、各メモリに応じた実行時間を比較してみたいと思います。

メモリ　⇒	128MB (CPUx2)	512MB (CPUx2)	1GB (CPUx2)	2GB (CPUx2)	3GB (CPUx3)	6GB (CPUx4)	7GB (CPUx5)	10GB (CPUx6)
シーケンシャル実行	49.405	11.891	6.135	3.422	3.410	3.503	3.601	3.470
マルチプロセス実行	56.760	12.692	6.502	3.364	2.456	1.391	1.069	0.869

※単位は秒です。

メモリ増やしただけで早くなってる？

（2022/01/13追記）
検証用のコードではメモリを大量消費するようなことをしてないはずなのにメモリ増やしただけで処理速度が速くなってるのが気になって調べてみたところ、どうやらCPUパワーもメモリに応じて上がるみたいです。
検証結果を見る限りでは2GB以降は変わらない感じですね。

Q: コンピューティングリソースはどのように AWS Lambda 関数に割り当てられるのですか?

AWS Lambda のリソースモデルでは、お客様が関数に必要なメモリ量を指定すると、それに比例した CPU パワーとその他のリソースが割り当てられます。例えば、256 MB のメモリを指定した場合に Lambda 関数に割り当てられる CPU パワーは、128 MB のメモリを指定した場合の約 2 倍となり、 512 MB のメモリを指定した場合の約半分となります。詳細については、関数の設定に関するドキュメントをご覧ください。

引用：https://aws.amazon.com/jp/lambda/faqs/

検証詳細

検証コード

lambda_function.py

import time
import lambda_multiprocessing as mp

def lambda_handler(event, context):
    print('params', event)
    
    is_multiprocessing = event['multiprocess']
    start_time = time.time()
    print('start time: ', start_time)
    
    if event['multiprocess']:
        print('run multiprocess')
        events = []
        for num in range(10):
            events.append({'function': _calc, 'args': []})
            
        m = mp.Multiprocessing()
        m.run(events)
        
    else:
        print('run sequential')
        for num in range(10):
            _calc()
    
    end_time = time.time()
    print('end time: ', start_time)

    print('run time: ', end_time - start_time)
    return 'complete'

def _calc():
    print('calc start')
    for num in range(10000000):
        a = num
    print('calc end')

マルチプロセス用のクラスです。
multiprocessing.Queue/Poolが使えないので、自前でキューイング処理を実装しています。
また、物理CPUではなくvCPUなのでオーバーコミットを防ぐために使用可能なCPUの数を並列実行可能なプロセス数として制御しています。

lambda_multiprocessing.py

import os
import time
import multiprocessing as mp
import threading as th

class Multiprocessing:
    def __init__(self):
        self.queue = []
        # 使用可能なCPU数を並列実行可能上限とする
        # multiprocessing.cpu_count()だと使用可能なCPUではないため注意
        # 参考　https://docs.python.org/ja/3/library/multiprocessing.html#multiprocessing.cpu_count
        self.limit = len(os.sched_getaffinity(0))
        
    def run(self, events):
        print('available CPU: ', self.limit)
        threads = []
        for event in events:
            t = th.Thread(target=self._proc, args=(event,))
            t.start()
            threads.append(t)
            
        # 全スレッド終了を待つ
        for thread in threads:
            thread.join()
                
    def _proc(self, event):
        if len(self.queue) == self.limit:
            time.sleep(0.01)
            self._proc(event)
            return
        
        self.queue.append(event)
        p = mp.Process(target=event['function'], args=(event['args']))
        p.start()
        p.join()
        self.queue.remove(event)