More than 5 years have passed since last update.

Joblibで共有メモリを設定する時につまづいたこと

Posted at 2019-08-20

Pythonで並列処理をしたい時、選択肢としてmultiprocessingかJoblibの二択がまず出てきますが、サクッとやりたい時はJoblibを使うことになると思います。

しかしプロセス毎に参照可能なメモリ領域が異なるため、並列実行している関数から外部スコープの変数に代入するためには共有メモリを設定しなければいけません。そこで以下の記事を参考にValueクラスを使って共有メモリを設定しようとしましたが、

なぜか動きません。試しに記事内に書かれているコードをそのままコピペしても、

# -*- coding: utf-8 -*-
from joblib import Parallel, delayed
from multiprocessing import Value, Array

shared_int = Value('i', 1)

def process(n):
    shared_int.value = 3.14
    return sum([i*n for i in range(100000)])

# 繰り返し計算 (並列化)
Parallel(n_jobs=-1)( [delayed(process)(i) for i in range(10000)] )

print(shared_int.value)

RuntimeError: Synchronized objects should only be shared between processes through inheritance

同様のエラーが起きちゃいます。さあどうしよう。どこも同じやり方しか書いてなくて困った。じゃあ公式リファレンスを見てみよう。

Embarrassingly parallel for loops

However if the parallel function really needs to rely on the shared memory semantics of threads, it should be made explicit with require='sharedmem', for instance:

>>> shared_set = set()
>>> def collect(x):
...    shared_set.add(x)
...
>>> Parallel(n_jobs=2, require='sharedmem')(
...     delayed(collect)(i) for i in range(5))
[None, None, None, None, None]
>>> sorted(shared_set)
[0, 1, 2, 3, 4]

えっ、Parallelの引数にrequire='sharedmem'設定するだけでいいんですか？試しにさっきのコードを書き換えてみると、

# -*- coding: utf-8 -*-
from joblib import Parallel, delayed

shared_int = 1

def process(n):
    global shared_int
    shared_int = 3.14
    return sum([i*n for i in range(10000)])

# 繰り返し計算 (並列化)
Parallel(n_jobs=-1, require='sharedmem')([delayed(process)(i) for i in range(10000)])

print(shared_int)

3.14

できました。ちゃんと並列化した関数の中から外部スコープの変数に代入できています。

てことで、どうやらParallelの引数にrequire='sharedmem'を入れるだけで共有メモリを設定できちゃうようです。簡単！

でも

Keep in mind that relying a on the shared-memory semantics is probably suboptimal from a performance point of view as concurrent access to a shared Python object will suffer from lock contention.

ーー和訳ーー
共有Pythonオブジェクトへの同時アクセスではロック競合が発生するため、パフォーマンスの観点からは、共有メモリーのセマンティクスに依存するのは最適とは言えないことに注意してください。

とも書いてあるので、気をつけて使ったほうが良さそうです。

参照

[Python] Joblibでお手軽並列処理

Embarrassingly parallel for loops

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up