More than 1 year has passed since last update.

scipy.stats: アンダーソン・ダーリング検定 anderson

Last updated at 2022-06-08Posted at 2022-06-08

scipy.stats: アンダーソン・ダーリング検定 `anderson`

データが特定の分布関数からのものであるかどうかの検定。

scipy.stats.shapiro が正規分布を対象にしているのに対して，いろいろな分布に従っているかどうかを検定できる。

anderson(x, dist='norm')

dist には 'norm', 'expon', 'logistic', gumbel', 'gumbel_l', 'gumbel_r', 'extreme1' が指定できる（デフォルトは　'norm'）。

from scipy.stats import anderson
import numpy as np

np.random.seed(123)
x = np.random.normal(50, 10, 20)
result = anderson(x)
result

AndersonResult(statistic=0.36238834776122175, critical_values=array([0.506, 0.577, 0.692, 0.807, 0.96 ]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))

検定結果の解釈は，この関数が $p$ 値を返してくれないものだから，面倒くさいことになっている。

If the returned statistic is larger than these critical values then for the corresponding significance level, the null hypothesis that the data come from the chosen distribution can be rejected.

0.36238834776122175 < 0.692 なので，帰無仮説は棄却できない。

R の nortest::ad.test では，

A = 0.36239, p-value = 0.4078
なので，帰無仮説は棄却できない（ちゃんと $p$ 値を算出してくれている）

ad.test が何をやっているのか見てみると，統計量の計算は 4 行で， $p$ 値の近似計算に 9 行もかけている。近似計算を移植する。

pvalue(statistic, n)

def pvalue(A, n):
    AA = (1 + 0.75/n + 2.25/n**2) * A
    if AA < 0.2:
        return 1 - np.exp(-13.436 + 101.14 * AA - 223.73 * AA**2)
    elif AA < 0.34:
        return 1 - np.exp(-8.318 + 42.796 * AA - 59.938 * AA**2)
    elif AA < 0.6:
        return np.exp(0.9177 - 4.279 * AA - 1.38 * AA**2)
    elif AA < 10:
        return np.exp(1.2937 - 5.709 * AA + 0.0186 * AA**2)
    else:
        return 3.7e-24

pvalue(result[0], len(x))

0.40777940924245076

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up