WAA06 #R - Qiita

データ解析基礎論 a weekly assignment A06

WAA06.1

# データのinputと成形
dat <- read.csv("http://peach.l.chiba-u.ac.jp/course_folder/waa05.csv")
dat$condition = factor(dat$condition, levels(dat$condition)[2:1])
medicineA <- dat[dat$medicine == "Medicine A",]
# lm
medicineA.lm <- lm(blood.pressure ~ condition,data = medicineA)
summary(medicineA.lm)
# 出力結果
Call:
lm(formula = blood.pressure ~ condition, data = medicineA)

Residuals:
   Min     1Q Median     3Q    Max 
 -9.76  -3.88   0.18   4.12  11.12 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   130.8800     0.7014  186.61   <2e-16 ***
conditionpost -26.1200     0.9919  -26.33   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.959 on 98 degrees of freedom
Multiple R-squared:  0.8762,    Adjusted R-squared:  0.8749 
F-statistic: 693.5 on 1 and 98 DF,  p-value: < 2.2e-16

回帰分析の結果、薬Aの条件で投薬前後には有意な差があることがわかった。

WAA06.2

# データ成形
post <- dat[dat$condition == "post",]
# lm
dat.lm <- lm(blood.pressure ~ medicine,data = post)
# plot
plot(as.numeric(post$medicine)-1,post$blood.pressure,pch = 19,
     ylab = "blood pressure",xlab = "medicine type",xaxt = "n",
     xlim=c(-0.5,1.5))
axis(1,c(0,1),c("Medicine A","Medicine B"))
abline(dat.lm,col = "red")

summary(dat.lm)
# 出力結果
Call:
lm(formula = blood.pressure ~ medicine, data = post)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.180  -4.760  -0.470   4.385  17.820 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        104.7600     0.8742  119.84   <2e-16 ***
medicineMedicine B  25.4200     1.2363   20.56   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.181 on 98 degrees of freedom
Multiple R-squared:  0.8118,    Adjusted R-squared:  0.8099 
F-statistic: 422.8 on 1 and 98 DF,  p-value: < 2.2e-16

回帰分析の結果、投薬後の薬Aと薬Bの間には有意な差がることが示された。

WAA06.3

dat<-read.table("http://www.matsuka.info/data_folder/tdkPATH01.txt",header=T)
plot(dat)

All

varianceAll.lm <- lm(grade ~ study + absence + knowledge + interest,dat)
summary(varianceAll.lm)
# 出力結果
Call:
lm(formula = grade ~ study + absence + knowledge + interest, 
    data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-17.900  -5.146  -0.587   6.524  13.202 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 54.94744    9.97293   5.510 9.83e-07 ***
study        0.08355    0.03667   2.279  0.02660 *  
absence     -0.58676    0.12810  -4.580 2.70e-05 ***
knowledge    0.36450    0.12650   2.882  0.00563 ** 
interest     0.86450    1.56025   0.554  0.58177    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.463 on 55 degrees of freedom
Multiple R-squared:  0.7608,    Adjusted R-squared:  0.7434 
F-statistic: 43.74 on 4 and 55 DF,  p-value: < 2.2e-16

3変数

variance3.lm <- lm(grade ~ study + absence + knowledge,dat)
summary(variance3.lm)
# 出力結果

Call:
lm(formula = grade ~ study + absence + knowledge, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.9720  -4.9393  -0.6443   6.6919  13.2094 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 55.70510    9.81742   5.674 5.12e-07 ***
study        0.09713    0.02711   3.583 0.000713 ***
absence     -0.61701    0.11516  -5.358 1.64e-06 ***
knowledge    0.39900    0.10943   3.646 0.000585 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.417 on 56 degrees of freedom
Multiple R-squared:  0.7595,    Adjusted R-squared:  0.7466 
F-statistic: 58.95 on 3 and 56 DF,  p-value: < 2.2e-16

2変数

variance2.lm <- lm(grade ~ absence + knowledge,dat)
summary(variance2.lm)
# 出力結果

Call:
lm(formula = grade ~ absence + knowledge, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.8179  -5.2150   0.2177   4.5910  18.1371 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 76.89733    8.61051   8.931 2.00e-12 ***
absence     -0.88513    0.09619  -9.201 7.25e-13 ***
knowledge    0.29979    0.11634   2.577   0.0126 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.151 on 57 degrees of freedom
Multiple R-squared:  0.7044,    Adjusted R-squared:  0.694 
F-statistic:  67.9 on 2 and 57 DF,  p-value: 8.246e-16

1変数

variance1.lm <- lm(grade ~ absence ,dat)
summary(variance1.lm)
# 出力結果
Call:
lm(formula = grade ~ absence, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.5207  -6.5420  -0.0378   6.5188  19.6460 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 98.31673    2.35253   41.79  < 2e-16 ***
absence     -0.99020    0.09126  -10.85 1.38e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.538 on 58 degrees of freedom
Multiple R-squared:  0.6699,    Adjusted R-squared:  0.6642 
F-statistic: 117.7 on 1 and 58 DF,  p-value: 1.384e-15

Adjusted R-squaredが一番高かったのは3変数の時であった。そのため、3変数モデルが一番適していると言える。

解答例