More than 3 years have passed since last update.

【R入門】R言語の基本:2つの平均値比較

Last updated at 2022-01-30Posted at 2022-01-30

統計とR言語の勉強をしています。
「Rによるやさしい統計学」の写経で勉強。

実行環境

Windows10 Pro 64bit
R：3.4.3
RStudio：1.1.383
RStudioのConsoleを使って実行しています

2つの平均値比較

以下の前提でRで検定します。

項目	内容
前提	統計テスト1の得点の平均値で男女に有意な差があるか
帰無仮説	$\mu_1 = \mu_2$：2つの母平均は等しい
対立仮説	$\mu_1 \ne \mu_2$：2つの母平均は等しくない
検定統計量	$\frac{\overline{X}-\overline{Y}}{\sqrt{(\frac{1}{m}+\frac{1}{n}){S_{xy}}^2}}$ ${S_{xy}}^2=\frac{1}{m+n-2}[\sum_{i=0}^m(X_i-\overline{X})^2+\sum_{i=0}^n(Y_i-\overline{Y})^2]$
有意水準	5%

まずは、検定統計量を算出します。

> stat1_male <- c(6,10,6,10,5,3,5,9,3,3)
> stat1_female <- c(11,6,11,9,7,5,8,7,7,9)

# プール標準偏差
> pool_st <- sqrt(((length(stat1_male)-1)*var(stat1_male)+(length(stat1_female)-1)*var(stat1_female))/(length(stat1_male)+length(stat1_female)-2))

> tbunbo <- pool_st*sqrt(1/length(stat1_male)+1/length(stat1_female))
> tbunshi <- mean(stat1_male) - mean(stat1_female)
> tstat <- tbunshi/tbunbo
> tstat
[1] -1.842885

qt関数で上側と下側での棄却域を自由度18のt分布で算出します。先に求めた-1.842885は下側の-2.100922以上なので帰無仮説「2つの母平均は等しい」は採択されます。

> qt(0.025, 18)  #下側
[1] -2.100922
> qt(0.975, 18)  #上側
[1] 2.100922

t.test関数を使って検定もできます。オプションの"var.equal = TRUE"は、分散が同じだということを前提としています。"var.equal = FALSE"にするとWelchのt検定となり母分散が異なる場合の検定です。

> t.test(stat1_male,stat1_female,var.equal = TRUE)

	Two Sample t-test

data:  stat1_male and stat1_female
t = -1.8429, df = 18, p-value = 0.08188
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.2800355  0.2800355
sample estimates:
mean of x mean of y 
        6         8

分散の等質性の検定

2つの母分散が等しいかを検定します。

項目	内容
前提	クラスAとクラスBの得点の分散に有意な差があるか
帰無仮説	${S_x}^2 = {S_y}^2$：2つの母分散は等しい
対立仮説	${S_x}^2 \ne {S_y}^2$：2つの母分散は等しくない
検定統計量	$\frac{{S_x}^2}{{S_y}^2}$ ${S_{x}}^2=\frac{1}{m-1}\sum_{i=0}^m(X_i-\overline{X})^2$ ${S_{y}}^2=\frac{1}{n-1}\sum_{i=0}^n(Y_i-\overline{Y})^2$
有意水準	5%

var.test関数を使って一気に検定します。検定統計量は自由度18のF分布に従います。p値は0.03206と棄却域に達しているため、帰無仮説「2つの母分散は等しい」は棄却されます。

> class_a <- c(54,55,52,48,50,38,41,40,53,52)
> class_b <- c(67,63,50,60,61,69,43,58,36,29)
> var.test(class_a,class_b)

	F test to compare two variances

data:  class_a and class_b
F = 0.21567, num df = 9, denom df = 9, p-value = 0.03206
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.05356961 0.86828987
sample estimates:
ratio of variances 
         0.2156709

母分散が異なる場合の母平均のt検定はwelchのt検定を使います。t.test関数で"var.equal"オプションをFALSEにします。今回の例ですと、p値が0.2838と有意水準以上なので、母平均は等しいという帰無仮説が採択されます。

> t.test(class_a,class_b,var.equal = FALSE)

	Welch Two Sample t-test

data:  class_a and class_b
t = -1.1191, df = 12.71, p-value = 0.2838
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -15.554888   4.954888
sample estimates:
mean of x mean of y 
     48.3      53.6

対応のあるt検定

2つのデータに対応がある場合の検定。対応は個別データが対応しているかを意味している。

項目	内容
前提	テストを2回実施して、その2回の得点が変化したと言えるか？
帰無仮説	$\mu_D=0$：得点の変化の母平均は0
対立仮説	$\mu_D \ne 0$：得点の変化の母平均は0ではない
検定統計量	$\frac{\overline{D}}{\frac{\sigma_d}{\sqrt{n}}}$
有意水準	5%

いつも通り検定統計量を算出し、自由度19のt検定をします。検定統計量4.839903が2.093024以上で棄却域にいるので帰無仮説「得点の変化の母平均は0」は棄却されます(得点に変化があったとみなす)。

> statistics_test1 <- c(6, 10, 6, 10, 5, 3, 5, 9, 3, 3, 11, 6, 11, 9, 7, 5, 8, 7, 7, 9)
> statistics_test2 <- c(10, 13, 8, 15, 8, 6, 9, 10, 7, 3, 18, 14, 18, 11, 12, 5, 7,12, 7, 7)
> henka <- statistics_test2 - statistics_test1
> tbunbo <- sd(henka)/sqrt(length(henka))
> tbunshi <- mean(henka)
> tstat <- tbunshi/tbunbo
> tstat
[1] 4.839903
> qt(0.025,19)
[1] -2.093024
> qt(0.025,19,lower.tail = FALSE)
[1] 2.093024

t.test関数でもOKです。

> t.test(henka)

	One Sample t-test

data:  henka
t = 4.8399, df = 19, p-value = 0.0001138
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 1.702645 4.297355
sample estimates:
mean of x 
        3

また、対応のあるt検定の場合、paired=TRUEオプションを追加して実行しても同じです。

> t.test(statistics_test1,statistics_test2,paired=TRUE)

	Paired t-test

data:  statistics_test1 and statistics_test2
t = -4.8399, df = 19, p-value = 0.0001138
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.297355 -1.702645
sample estimates:
mean of the differences 
                     -3

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up