LoginSignup
4
4

More than 5 years have passed since last update.

RとPythonでKruskal-Wallis検定

Posted at

データ作成

とりあえずPythonで適当なデータを作る。

>>> import numpy as np
>>> import pandas as pd
>>> n = 100
>>> val = np.random.randn(n)
>>> cls = np.random.choice(['A', 'B', 'C'], n)
>>> a = pd.DataFrame(dict(cls=cls, val=val))
>>> a.head()
  cls       val
0   B  0.040708
1   A  2.736491
2   C  0.348282
3   C -0.942777
4   C  0.463018
>>> a.to_csv('hoge.tsv', index=False, sep='\t')

R

> a <- read.table('hoge.tsv', header=T)
> head(a)
  cls         val
1   B  0.04070846
2   A  2.73649079
3   C  0.34828178
4   C -0.94277665
5   C  0.46301770
6   A -0.36467223
> kruskal.test(a$val ~ a$cls)

        Kruskal-Wallis rank sum test

data:  a$val by a$cls
Kruskal-Wallis chi-squared = 0.63086, df = 2, p-value = 0.7295

Python

>>> import pandas as pd
>>> from scipy import stats
>>> a = pd.read_table('hoge.tsv')
>>> stats.kruskal(*(x[1] for x in a.groupby('cls')['val']))
KruskalResult(statistic=0.63086166681182476, pvalue=0.72947452456379802)
>>> len(a['cls'].unique())-1
2 # 自由度は勝手に出ないので、必要ならカテゴリ数-1で求める。

同じ結果が得られました。

4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4