0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Octave で距離行列の計算

Posted at

Octave で距離行列の計算

データ行列のすべての 2 行の組み合わせで,距離を計算する。

usage:
y = pdist (x, method)

x: データ行列
method: "euclidean"(デフォルト)  ユークリッド距離
        "squaredeuclidean"    ユークkリッド平方距離
        "seuclidean"          標準化ユークリッド距離
        "mahalanobis"         マハラノビス距離
        "cityblock"           市街地距離
        "minkowski"           ミンコフスキー距離
        "cosine"              1 - cos(θ)
        "correlation"         1 - r
        "spearman"            1 - τ
        "hamming"             ハミング距離
        "jaccard"             1 - ジャッカード係数
        "chebychev"           チビシェフ距離

使用例

pkg load statistics % statistics が必要
format short
x = [49.627 50.610 51.617 52.013 45.099;
     52.549 56.746 46.140 39.278 37.066;
     48.272 32.762 54.648 49.537 63.903]
x =

   49.627   50.610   51.617   52.013   45.099
   52.549   56.746   46.140   39.278   37.066
   48.272   32.762   54.648   49.537   63.903
y = pdist(x)
y =

   17.404   26.254   38.618

$m \times n$ 行列の距離行列は $m \times m$ 正方行列になるが,pdist() は 距離行列の下三角行列を $m (m-1) / 2$ の一次元配列(ベクトル)で返す。

これを  $m \times m$ 正方行列に変換するために squareform() がある。

squareform(y)
ans =

         0   17.4039   26.2544
   17.4039         0   38.6184
   26.2544   38.6184         0
format long % 表示精度を高くする(任意)
x = csvread("iris.csv")(2:151, 2:5);

"euclidean" ユークリッド距離

y = pdist(x); # pdist(x, "euclidean")
y(1)
def = norm(x(1, :) - x(2, :), "rows")
ans = 0.538516480713450
def = 0.538516480713450

"squaredeuclidean" ユークリッド平方距離

y = pdist(x, "squaredeuclidean");
y(1)
def = sumsq(x(1, :) - x(2, :))
ans = 0.290000000000000
def = 0.290000000000000

"seuclidean" 標準化ユークリッド距離

y = pdist(x, "seuclidean");
y(1)
z = zscore(x);
def = sqrt(sumsq(z(1, :) - z(2, :)))
pdist(z, "euclidean")(1)
ans = 1.172291398047053
def = 1.172291398047053
ans = 1.172291398047053

"mahalanobis" マハラノビス距離

y = pdist(x, "mahalanobis");
y(1)
COV = inv(cov(x));
def = sqrt((x(1, :) - x(2, :)) * COV * (x(1, :) - x(2, :))')
ans = 1.354457239896679
def = 1.354457239896679

"cityblock" 市街地距離

y = pdist(x, "cityblock");
y(1)
def = sum(abs(x(1, :) - x(2, :)))
ans = 0.699999999999999
def = 0.699999999999999

"minkowski" ミンコフスキー距離

y = pdist(x, "minkowski", p=1);
y(1)
z = pdist(x, "cityblock");
z(1)
ans = 0.699999999999999
ans = 0.699999999999999
y = pdist(x, "minkowski", p=2);
y(1)
z = pdist(x, "euclidean");
z(1)
ans = 0.538516480713450
ans = 0.538516480713450
p = 0.123;
y = pdist(x, "minkowski", p=p);
y(1)
norm(x(1, :) - x(2, :), p, "rows")
ans = 89.74319789431301
ans = 89.74319789431301

"cosine" 1 - cos(θ)

y = pdist(x, "cosine");
y(1)
1 - sum(x(1, :) .* x(2, :), 2) ./ sqrt(sumsq(x(1, :), 2) .* sumsq (x(2, :), 2))
ans = 1.420836495978128e-03
ans = 1.420836495978128e-03

"correlation" 1 - r

y = pdist(x, "correlation");
y(1)
def = 1 - corr(x(1, :), x(2, :))
ans = 4.001338759739625e-03
def = 4.001338759739625e-03

"spearman" 1 - τ

y = pdist(x, "spearman");
y(1)
def = 1 - spearman(x(1, :), x(2, :))
ans = -2.220446049250313e-16
def = -2.220446049250313e-16

"hamming" ハミング距離

y = pdist(x, "hamming");
y(1)
def = 1 - mean(x(1, :) == x(2, :))
ans = 0.500000000000000
def = 0.500000000000000

"jaccard" 1 - ジャッカード係数

y = pdist(x, "jaccard");
y(1)
weights = x(1, :) | x(2, :);
sum(logical(x(1, :) - x(2, :)) & weights, 2) ./ sum(weights, 2)
ans = 0.500000000000000
ans = 0.500000000000000

"chebychev" チビシェフ距離

y = pdist(x, "chebychev");
y(1)
def = max(abs(x(1, :) - x(2, :)))
pdist(x, "minkowski", Inf)(1)
ans = 0.500000000000000
def = 0.500000000000000
ans = 0.500000000000000
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?