Help us understand the problem. What is going on with this article?

pdist2を並列化

More than 5 years have passed since last update.

pdist2は,データ間のペアワイズ距離を計算してくれる便利関数です.ですが,Matlabプリセットのpdist2はマルチコアでの並列化に対応してないので,データサイズがでかくなると計算時間がかかるという問題があります.

そこで本記事では,データ並列化をすることで計算の高速化を目指します.statistic toolbox (pdist2), parallel computing toolbox (parfor) が必要ですのでご注意ください.

pdist2_parallel
function [ dmat ] = pdist2_parallel( X, Y, block_size, fun )
%PDIST2_PARALLEL parallelized version of pdist2

if(nargin<4)
    fun='euclidean'; % この辺はお好みで
end

% decompose similarity matrix
step_x=unique([1:block_size:size(X,1),size(X,1)+1])';
step_x=[step_x(1:end-1),step_x(2:end)-1];
nsteps_x=size(step_x,1);
X_sub=cell(nsteps_x,1);
for i=1:nsteps_x
    X_sub{i}=X(step_x(i,1):step_x(i,2),:); % block_sizeに基づいてデータを分割
end
step_y=unique([1:block_size:size(Y,1),size(Y,1)+1])';
step_y=[step_y(1:end-1),step_y(2:end)-1];
nsteps_y=size(step_y,1);
Y_sub=cell(nsteps_y,1);
for i=1:nsteps_y
    Y_sub{i}=Y(step_y(i,1):step_y(i,2),:); % block_sizeに基づいてデータを分割
end

dmat=cell(nsteps_x, nsteps_y);
parfor i=1:nsteps_x*nsteps_y % データ並列化します
    i1=mod(i-1,nsteps_x)+1;
    i2=(i-mod(i-1,nsteps_x)-1)/nsteps_x+1;
    dmat{i}=pdist2(X_sub{i1},Y_sub{i2}, fun);
end
dmat=cell2mat(dmat); % 結果を結合します

end

実験

実験環境は
Mac OS X (10.7.5)
Processor 2 x 2.66 GHz 6-Core Intel Xeon
Memory 32GB
Matlab2012b
12個のworkerで実験を行います(つまり最大12並列).

  • まずは分割の仕方で結果がどのくらい変わるかを見てみます. データは実験的に100次元,5000サンプルでやってみます. データ間距離はユークリッド距離で測ります.
X=rand(5000,100);
block_size=[10, 50, 100, 500, 1000, 2500];

% preallocation
dmat=zeros(size(X,1),size(X,1));



for i=1:length(block_size)
    tic
    dmat=pdist2_parallel(X,X,block_size(i));
    t=toc;
    fprintf('block size %d: %f\n', block_size(i), t);
end
tic
dmat=pdist2(X,X);
t=toc;
fprintf('not parallelized: %f\n', t);
  • 実行結果
block size 10: 6.520659
block size 50: 1.252058
block size 100: 1.172509
block size 500: 1.237493
block size 1000: 1.209794
block size 2500: 1.765514
not parallelized: 2.847441

並列化しない場合と比較して,最大で2倍強速くなりました.ただし,分割を細かくしすぎるとパフォーマンスが落ちてしまうので,適切な分割サイズを見つける必要がありそうです.


  • 次に,データサイズによって結果がどの程度変わるかを見てみます. 先の実験結果を参考に,データを10分割(つまり計算を100分割)して計算してみます.
%% test for data size
data_size=[100, 500, 1000, 5000, 10000];
for i=1:length(data_size)

    X=rand(data_size(i),100);
    block_size=data_size(i)/50;
    fprintf('data size: %d\n', data_size(i));

    % preallocation
    dmat=zeros(size(X,1),size(X,1));

    tic
    dmat=pdist2(X,X);
    t=toc;

    fprintf('not parallelized: %f\n', t);

    tic
    dmat=pdist2_parallel(X,X,block_size);
    t=toc;
    fprintf('parallelized: %f\n', t);

end
  • 実行結果
data size: 100
not parallelized: 0.001552
parallelized: 0.055133
data size: 500
not parallelized: 0.029648
parallelized: 0.068817
data size: 1000
not parallelized: 0.113881
parallelized: 0.104183
data size: 5000
not parallelized: 2.892644
parallelized: 1.191135
data size: 10000
not parallelized: 11.506141
parallelized: 4.499590

データサイズが小さいうちはオーバーヘッドの問題で並列化しない方が高速ですが,データサイズが大きくなると,並列化した方が2倍強速くなりました.

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away