More than 5 years have passed since last update.

RとPythonでKruskal-Wallis検定

Posted at 2018-02-19

データ作成

とりあえずPythonで適当なデータを作る。

>>> import numpy as np
>>> import pandas as pd
>>> n = 100
>>> val = np.random.randn(n)
>>> cls = np.random.choice(['A', 'B', 'C'], n)
>>> a = pd.DataFrame(dict(cls=cls, val=val))
>>> a.head()
  cls       val
0   B  0.040708
1   A  2.736491
2   C  0.348282
3   C -0.942777
4   C  0.463018
>>> a.to_csv('hoge.tsv', index=False, sep='\t')

R

> a <- read.table('hoge.tsv', header=T)
> head(a)
  cls         val
1   B  0.04070846
2   A  2.73649079
3   C  0.34828178
4   C -0.94277665
5   C  0.46301770
6   A -0.36467223
> kruskal.test(a$val ~ a$cls)

        Kruskal-Wallis rank sum test

data:  a$val by a$cls
Kruskal-Wallis chi-squared = 0.63086, df = 2, p-value = 0.7295

Python

>>> import pandas as pd
>>> from scipy import stats
>>> a = pd.read_table('hoge.tsv')
>>> stats.kruskal(*(x[1] for x in a.groupby('cls')['val']))
KruskalResult(statistic=0.63086166681182476, pvalue=0.72947452456379802)
>>> len(a['cls'].unique())-1
2 # 自由度は勝手に出ないので、必要ならカテゴリ数-1で求める。

同じ結果が得られました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up