More than 5 years have passed since last update.

並列処理してみる

Last updated at 2013-04-01Posted at 2013-03-31

「並列化したのに却って遅くなった」が解決するかも？

次のループ処理について並列化してみます。

3で割った余りでそれぞれ総数を求める

// 初期化
int div = 3;
int[] array = new int[10000000];
Random random = new Random();
for (int i = 0; i < array.Length; i++) { array[i] = random.Next(); }

{	// ループ処理　for版
	int[] ct_mods = new int[div];
	for (int i = 0; i < array.Length; i++) {
		ct_mods[array[i] % div]++;
	}
}

{	// ループ処理　foreach版
	int[] ct_mods = new int[div];
	foreach (var x in array) {
		ct_mods[x % div]++;
	}
}

これを単純に並列処理にしたのが次の２つ。

並列処理(1-1)　for版

int[] ct_mods = new int[div];
object lockobj = new object();
Parallel.For(
	0,	// ループ変数の初期値
	array.Length,	// ループ変数の終了値
	(i) => {
		lock (lockobj) { ct_mods[array[i] % div]++; }
	});

並列処理(1-2)　foreach版

int[] ct_mods = new int[div];
object lockobj = new object();
Parallel.ForEach(
	array,
	(x) => {
		lock (lockobj) { ct_mods[x % div]++; }
	});

ロックとかする必要がなければこの書き方で十分高速化できるけど、今回の例では毎回結果格納用変数をロックしているから逆に遅くなったり。
ではどうすれば速くなるのかというと、次のように各スレッドごとに結果を求め、最後に結果をマージすれば良い。

並列処理(2-1)　結果を最後にマージする方式　for版

object lockobj = new object();
int[] ct_mods = new int[div];
Parallel.For<int[]>(	// int[]はスレッドのローカル変数の型
	0,
	array.Length,
	() => new int[div],	// 各スレッドのローカル変数（ここではct_mods_wk）の初期化
	(i, loopstate, ct_mods_wk) => {	// ct_mods_wkは並列の数だけ作成される
		ct_mods_wk[array[i] % div]++;
		return ct_mods_wk;
	},
	(ct_mods_wk) => {
		// 並列処理が終わったときに実行する処理
		// 各スレッドのローカル変数の値をマージする
		lock (lockobj) { for (int i = 0; i < div; i++) { ct_mods[i] += ct_mods_wk[i]; } }
	});

並列処理(2-2)　結果を最後にマージする方式　foreach版

object lockobj = new object();
int[] ct_mods = new int[div];
Parallel.ForEach<int, int[]>(	// intは要素の型、int[]はスレッドのローカル変数の型
	array,
	() => new int[div],
	(x, loopstate, ct_mods_wk) => {		ct_mods_wk[x % div]++;
		return ct_mods_wk;
	},
	(ct_mods_wk) => {
		lock (lockobj) { for (int i = 0; i < div; i++) { ct_mods[i] += ct_mods_wk[i]; } }
	});

参考

http://ufcpp.net/study/csharp/lib_parallel.html
http://msdn.microsoft.com/ja-jp/library/vstudio/dd460703.aspx
http://msdn.microsoft.com/ja-jp/library/vstudio/dd997393.aspx

2013/04/01追記

誤りがあったので修正しました。(2-1)および(2-2)で、マージ処理するところはロックする必要があります。
(1-1), (1-2)の総ロック回数 = 千万回
(2-1), (2-2)の総ロック回数 = 数十〜数百回（状況によって変動する）

参考までに、私の環境で並列化しない場合と比べて(2-1),(2-2)は約2倍高速化しました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up