More than 1 year has passed since last update.

scipy.stats: 相関係数 pearsonr, spearmanr, kendalltau

Last updated at 2022-06-22Posted at 2022-06-06

scipy.stats: 相関係数 `pearsonr`, `spearmanr`, `kendalltau`

1.　scipy.stats: ピアソンの積率相関係数 `pearsonr`

ピアソンの積率相関係数（いわゆる相関係数と略称されるもの）を計算する。

pearsonr(x, y)

import numpy as np

x = np.arange(15)
y = x**2

戻り値は，ピアソンの積率相関係数と，無相関検定の結果の $p$ 値。

from scipy.stats import pearsonr

r, p_value = pearsonr(x, y)
(r, p_value)

(0.9644093612193902, 6.916724428470378e-09)

戻り値には $t$ 値が含まれないので，本末転倒だが逆算する。

from scipy.stats import t
np.copysign(t.isf(p_value / 2, len(x) - 2), r)

13.150710114342226

あまり，出くわすことはないと思うが，ピアソンの積率相関係数の定義で，x または y の分散が　0 になる場合（変数がすべて同じ値をとる場合）には，0 による割り算をしようとするので，以下はエラーになる。

z = np.ones(15)

pearsonr(x, z)

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scipy/stats/_stats_py.py:4068: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())

(nan, nan)

2.　scipy.stats: スピアマンの順位相関係数 `spearmanr`

スピアマンの順位相関係数を計算する。

戻り値はスピアマンの順位相関係数と無相関検定の結果の $p$ 値。

spearmanr(a, b=None, axis=0, nan_policy='propagate', alternative='two-sided')

前項の pearsonr は直線的な相関の程度を測るが，spearmanr は曲線相関の程度も測れる。
前項の例の $y = x^2$ のような場合には，スピアマンの順位相関係数は 1，つまり完全な相関であるとされる。

from scipy.stats import spearmanr

r, p_value = spearmanr(x, y)
(r, p_value)

(1.0, 0.0)

完全に理論曲線にしたがわなくても， x, y が非減少な場合にも，スピアマンの順位相関係数は 1 になる。

x2 = np.array([0, 1, 2.5, 3, 3.2])
y2 = np.array([1, 2, 2.5, 7, 10])
import matplotlib.pyplot as plt
plt.scatter(x2, y2)
spearmanr(x2, y2)

SpearmanrResult(correlation=0.9999999999999999, pvalue=1.4042654220543672e-24)

x または y の分散が　0 になる場合（変数がすべて同じ値をとる場合）には，0 による割り算をしようとするので，以下はエラーになる。

spearmanr(x, z)

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scipy/stats/_stats_py.py:4529: SpearmanRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(SpearmanRConstantInputWarning())

SpearmanrResult(correlation=nan, pvalue=nan)

スピアマンの順位相関は，元のデータの順位を求め，順位をデータとみなしてピアソンの積率相関係数を計算するのと同じである。

np.random.seed(123)
a = np.random.randn(400)
b = np.random.randn(400)
spearmanr(a, b)

SpearmanrResult(correlation=0.07039862749142183, pvalue=0.15992901162148862)

順位でピアソンの積率相関係数を求める。

from scipy.stats import rankdata
rank_a = rankdata(a)
rank_b = rankdata(b)
pearsonr(rank_a, rank_b)

(0.0703986274914219, 0.1599290116214847)

3.　scipy.stats: ケンドールの順位相関係数 `kendalltau`

ケンドールの順位相関係数を計算する。

戻り値はケンドールの順位相関係数と無相関検定の結果の $p$ 値。

kendalltau(x, y, nan_policy='propagate', method='auto', variant='b', alternative='two-sided')

method は $p$ 値の算出法に関するもので，'exact', 'approx' を選定できる。適切な方を指定する 'auto' がデフォルトであるが，明示的に指定したほうがよい。

variant は，ケンドールのタウ（$\tau$）のバリアントの指定である。デフォルトの 'b' は $\tau_b$，'c' は $\tau_c$ を求める指定である。

前項の pearsonr は直線的な相関の程度を測るが，kendalltau も曲線相関の程度も測れる。
前項の例の $y = x^2$ のような場合には，ケンドールの順位相関係数は 1，つまり完全な相関であるとされる。

from scipy.stats import kendalltau

r, p_value = kendalltau(x, y)
(r, p_value)

(1.0, 1.5294327463639633e-12)

完全に理論曲線にしたがわなくても， x, y が非減少な場合にも，ケンドールの順位相関係数は 1 になる。

plt.scatter(x2, y2)
kendalltau(x2, y2)

KendalltauResult(correlation=0.9999999999999999, pvalue=0.016666666666666666)

x または y の分散が　0 になる場合（変数がすべて同じ値をとる場合）には，0 による割り算をしようとするので，以下はエラーは発生しないが，結果は nan になる。

kendalltau(x, z)

KendalltauResult(correlation=nan, pvalue=nan)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

scipy.stats: 相関係数 pearsonr, spearmanr, kendalltau

scipy.stats: 相関係数 pearsonr, spearmanr, kendalltau

1. scipy.stats: ピアソンの積率相関係数 pearsonr

2. scipy.stats: スピアマンの順位相関係数 spearmanr

3. scipy.stats: ケンドールの順位相関係数 kendalltau

scipy.stats: 相関係数 `pearsonr`, `spearmanr`, `kendalltau`

1.　scipy.stats: ピアソンの積率相関係数 `pearsonr`

2.　scipy.stats: スピアマンの順位相関係数 `spearmanr`

3.　scipy.stats: ケンドールの順位相関係数 `kendalltau`