__前回の記事__に続いて、R言語で、パイプ記法%>%を使って、以下を実行するコードを一筆書きしてみました。
- 回帰式の左辺と右辺の組合せを配列結合で複数作り出す
- 大量の回帰式を、__*map()*関数__を使って、データに対して一括適用する
- 複数の回帰モデルを適用した結果を、格納する列を__*mutate()*関数__で定義する
__安井 翔太(著)『効果検証入門』(技術評論社)の$pp.71-72$__の変数名を変えて、実行してみました。
なお、使用するデータセットは、次のGitHubリポジトリから取得しました。
> library(remotes)
> remotes::install_github("itamarcaspi/experimentdatar")
Skipping install of 'experimentdatar' from a github remote, the SHA1 (f71a9d07) has not changed since last install.
Use `force = TRUE` to force installation
>
> library(experimentdatar)
> library(broom)
> library(tidyverse)
─ Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ─
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.0.4 ✓ dplyr 1.0.2
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.0
─ Conflicts ──────────────────────────────────────── tidyverse_conflicts() ─
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
>
> data(vouchers)
>
> summary(vouchers)
ID BOG95SMP BOG97SMP JAM93SMP SEX AGE
Min. : 1 Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0.000 Min. :10.0
1st Qu.: 6333 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000 1st Qu.:14.0
Median : 12665 Median :0.00000 Median :0.00000 Median :0.000000 Median :1.000 Median :15.0
Mean : 27670 Mean :0.04643 Mean :0.01094 Mean :0.006514 Mean :0.526 Mean :14.7
3rd Qu.: 18997 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:1.000 3rd Qu.:16.0
Max. :141226 Max. :1.00000 Max. :1.00000 Max. :1.000000 Max. :1.000 Max. :20.0
NA's :1 NA's :5403 NA's :23392
AGE2 HSVISIT SCYFNSH INSCHL PRSCH_C PRSCHA_1 PRSCHA_2
Min. :-2.00 Min. :0.000 Min. : 4.000 Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000
1st Qu.:12.00 1st Qu.:0.000 1st Qu.: 5.000 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:1.00 1st Qu.:1.000
Median :13.00 Median :0.000 Median : 5.000 Median :1.000 Median :1.000 Median :1.00 Median :1.000
Mean :13.14 Mean :0.184 Mean : 5.173 Mean :0.861 Mean :0.658 Mean :0.89 Mean :0.756
3rd Qu.:14.00 3rd Qu.:0.000 3rd Qu.: 5.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:1.000
Max. :84.00 Max. :1.000 Max. :11.000 Max. :1.000 Max. :1.000 Max. :1.00 Max. :1.000
NA's :5519 NA's :23384 NA's :23399 NA's :23407 NA's :23409 NA's :23409
VOUCH0 BOG95ASD BOG97ASD JAM93ASD DBOGOTA DJAMUNDI
Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
Median :1.0000 Median :0.00000 Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
Mean :0.7178 Mean :0.08879 Mean :0.01804 Mean :0.01101 Mean :0.2296 Mean :0.01354
3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
NA's :1
D1995 D1997 RESPONSE TEST_TAK SEX_NAME SVY D1993
Min. :0.0000 Min. :0.00000 Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.00 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00000
Median :0.0000 Median :0.00000 Median :1.000 Median :0.000 Median :0.00 Median :1.00 Median :0.00000
Mean :0.1795 Mean :0.06988 Mean :0.541 Mean :0.046 Mean :0.49 Mean :0.51 Mean :0.04828
3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:0.00000
Max. :1.0000 Max. :1.00000 Max. :1.000 Max. :1.000 Max. :1.00 Max. :1.00 Max. :1.00000
NA's :22258 NA's :19173 NA's :20487 NA's :23384
PHONE DAREA1 DAREA2 DAREA3 DAREA4 DAREA5
Min. :0.0000 Min. :0.0000000 Min. :0.0000000 Min. :0.00e+00 Min. :0.000000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.00e+00 1st Qu.:0.000000 1st Qu.:0.00000
Median :1.0000 Median :0.0000000 Median :0.0000000 Median :0.00e+00 Median :0.000000 Median :0.00000
Mean :0.5659 Mean :0.0007501 Mean :0.0001579 Mean :3.95e-05 Mean :0.005448 Mean :0.01958
3rd Qu.:1.0000 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.00e+00 3rd Qu.:0.000000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000000 Max. :1.0000000 Max. :1.00e+00 Max. :1.000000 Max. :1.00000
DAREA6 DAREA7 DAREA8 DAREA9 DAREA10 DAREA11
Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0.000000 Min. :0.000000
1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000
Median :0.000000 Median :0.00000 Median :0.00000 Median :0.000000 Median :0.000000 Median :0.000000
Mean :0.006317 Mean :0.01176 Mean :0.00379 Mean :0.001027 Mean :0.006948 Mean :0.008685
3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000
Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.000000 Max. :1.000000 Max. :1.000000
DAREA12 DAREA13 DAREA14 DAREA15 DAREA16 DAREA17
Min. :0.000000 Min. :0.0000000 Min. :0.0000000 Min. :0.000000 Min. :0.000000 Min. :0.00000
1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000
Median :0.000000 Median :0.0000000 Median :0.0000000 Median :0.000000 Median :0.000000 Median :0.00000
Mean :0.002842 Mean :0.0001184 Mean :0.0002764 Mean :0.003277 Mean :0.001895 Mean :0.01208
3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000
Max. :1.000000 Max. :1.0000000 Max. :1.0000000 Max. :1.000000 Max. :1.000000 Max. :1.00000
DAREA18 DAREA19 DMONTH1 DMONTH2 DMONTH3 DMONTH4
Min. :0.000000 Min. :0.00000 Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.00000
1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.000000 Median :0.00000 Median :0.000000 Median :0.000000 Median :0.00000 Median :0.00000
Mean :0.003632 Mean :0.03095 Mean :0.001027 Mean :0.001658 Mean :0.02084 Mean :0.01311
3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.000000 Max. :1.00000 Max. :1.000000 Max. :1.000000 Max. :1.00000 Max. :1.00000
DMONTH5 DMONTH6 DMONTH7 DMONTH8 DMONTH9 DMONTH10
Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.000000 Min. :0.0000000 Min. :0.0000000
1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.0000000 1st Qu.:0.0000000
Median :0.000000 Median :0.000000 Median :0.00000 Median :0.000000 Median :0.0000000 Median :0.0000000
Mean :0.008685 Mean :0.006514 Mean :0.01686 Mean :0.006317 Mean :0.0006317 Mean :0.0003553
3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000
Max. :1.000000 Max. :1.000000 Max. :1.00000 Max. :1.000000 Max. :1.0000000 Max. :1.0000000
DMONTH11 DMONTH12 BOG95 BOG97 MOM_SCH MOM_AGE
Min. :0.0000000 Min. :0.0000000 Min. :0.0000 Min. :0.00000 Min. : 0.000 Min. : 8.00
1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.: 5.000 1st Qu.:35.00
Median :0.0000000 Median :0.0000000 Median :0.0000 Median :0.00000 Median : 5.000 Median :39.00
Mean :0.0004343 Mean :0.0003948 Mean :0.1597 Mean :0.06988 Mean : 5.894 Mean :40.41
3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.: 8.000 3rd Qu.:45.00
Max. :1.0000000 Max. :1.0000000 Max. :1.0000 Max. :1.00000 Max. :11.000 Max. :97.00
NA's :23607 NA's :23517
MOM_MW DAD_SCH DAD_AGE DAD_MW SEX2 STRATA1 STRATA2
Min. :0.00 Min. : 0.000 Min. : 1.00 Min. :0.000 Min. :0.000 Min. :0.00000 Min. :0.00000
1st Qu.:0.00 1st Qu.: 4.000 1st Qu.:38.00 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00 Median : 5.000 Median :43.00 Median :0.000 Median :0.000 Median :0.00000 Median :0.00000
Mean :0.02 Mean : 5.856 Mean :44.25 Mean :0.098 Mean :0.486 Mean :0.01176 Mean :0.04177
3rd Qu.:0.00 3rd Qu.: 8.000 3rd Qu.:49.00 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00 Max. :11.000 Max. :91.00 Max. :1.000 Max. :1.000 Max. :1.00000 Max. :1.00000
NA's :23546 NA's :23946 NA's :23806 NA's :23933 NA's :20180
STRATA3 STRATA4 STRATA5 STRATA6 STRATAMS REPT6
Min. :0.00000 Min. :0.0000000 Min. :0.0e+00 Min. :0.00e+00 Min. :0.0000 Min. :0.000
1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.0e+00 1st Qu.:0.00e+00 1st Qu.:1.0000 1st Qu.:0.000
Median :0.00000 Median :0.0000000 Median :0.0e+00 Median :0.00e+00 Median :1.0000 Median :0.000
Mean :0.01046 Mean :0.0004343 Mean :7.9e-05 Mean :3.95e-05 Mean :0.9355 Mean :0.139
3rd Qu.:0.00000 3rd Qu.:0.0000000 3rd Qu.:0.0e+00 3rd Qu.:0.00e+00 3rd Qu.:1.0000 3rd Qu.:0.000
Max. :1.00000 Max. :1.0000000 Max. :1.0e+00 Max. :1.00e+00 Max. :1.0000 Max. :3.000
NA's :23411
TOTSCYRS HASCHILD MARRIED WORKING REPT NREPT FINISH6
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000
Median :4.000 Median :0.000 Median :0.000 Median :0.000 Median :0.000 Median :0.000 Median :1.000
Mean :3.368 Mean :0.025 Mean :0.012 Mean :0.149 Mean :0.164 Mean :0.183 Mean :0.936
3rd Qu.:4.000 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:1.000
Max. :6.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :3.000 Max. :1.000
NA's :23395 NA's :23400 NA's :23399 NA's :23395 NA's :23395 NA's :23395 NA's :23395
FINISH7 FINISH8 SEX_MISS USNGSCH HOURSUM TAB3SMPL WORKING3
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. : 0.000 Min. :1 Min. :0.000
1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.: 0.000 1st Qu.:1 1st Qu.:0.000
Median :1.000 Median :1.000 Median :0.000 Median :0.000 Median : 0.000 Median :1 Median :0.000
Mean :0.661 Mean :0.522 Mean :0.001 Mean :0.351 Mean : 3.626 Mean :1 Mean :0.134
3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.: 0.000 3rd Qu.:1 3rd Qu.:0.000
Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :40.000 Max. :1 Max. :1.000
NA's :23395 NA's :23395 NA's :23395 NA's :23395 NA's :23395 NA's :23753 NA's :23395
>
> head(vouchers)
# A tibble: 6 x 89
ID BOG95SMP BOG97SMP JAM93SMP SEX AGE AGE2 HSVISIT SCYFNSH INSCHL PRSCH_C PRSCHA_1 PRSCHA_2 VOUCH0 BOG95ASD
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 0 0 0 NA NA NA NA 5 NA NA NA NA NA 0
2 1 0 0 0 1 NA 12 NA 5 NA NA NA NA 0 1
3 2 0 0 0 0 NA 13 NA 5 NA NA NA NA 0 1
4 3 1 0 0 0 14 12 0 8 1 1 1 1 1 1
5 4 1 0 0 1 14 12 0 8 1 1 1 1 0 1
6 5 1 0 0 0 14 12 0 8 1 0 1 0 0 1
# … with 74 more variables: BOG97ASD <dbl>, JAM93ASD <dbl>, DBOGOTA <dbl>, DJAMUNDI <dbl>, D1995 <dbl>, D1997 <dbl>,
# RESPONSE <dbl>, TEST_TAK <dbl>, SEX_NAME <dbl>, SVY <dbl>, D1993 <dbl>, PHONE <dbl>, DAREA1 <dbl>, DAREA2 <dbl>,
# DAREA3 <dbl>, DAREA4 <dbl>, DAREA5 <dbl>, DAREA6 <dbl>, DAREA7 <dbl>, DAREA8 <dbl>, DAREA9 <dbl>, DAREA10 <dbl>,
# DAREA11 <dbl>, DAREA12 <dbl>, DAREA13 <dbl>, DAREA14 <dbl>, DAREA15 <dbl>, DAREA16 <dbl>, DAREA17 <dbl>,
# DAREA18 <dbl>, DAREA19 <dbl>, DMONTH1 <dbl>, DMONTH2 <dbl>, DMONTH3 <dbl>, DMONTH4 <dbl>, DMONTH5 <dbl>,
# DMONTH6 <dbl>, DMONTH7 <dbl>, DMONTH8 <dbl>, DMONTH9 <dbl>, DMONTH10 <dbl>, DMONTH11 <dbl>, DMONTH12 <dbl>,
# BOG95 <dbl>, BOG97 <dbl>, MOM_SCH <dbl>, MOM_AGE <dbl>, MOM_MW <dbl>, DAD_SCH <dbl>, DAD_AGE <dbl>, DAD_MW <dbl>,
# SEX2 <dbl>, STRATA1 <dbl>, STRATA2 <dbl>, STRATA3 <dbl>, STRATA4 <dbl>, STRATA5 <dbl>, STRATA6 <dbl>,
# STRATAMS <dbl>, REPT6 <dbl>, TOTSCYRS <dbl>, HASCHILD <dbl>, MARRIED <dbl>, WORKING <dbl>, REPT <dbl>,
# NREPT <dbl>, FINISH6 <dbl>, FINISH7 <dbl>, FINISH8 <dbl>, SEX_MISS <dbl>, USNGSCH <dbl>, HOURSUM <dbl>,
# TAB3SMPL <dbl>, WORKING3 <dbl>
>
####重回帰分析の右辺(複数の独立変数を線形に加算した式をリテラルで記述)
> lm_independent_values_expression <- "SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
####独立変数の中で、最も基本(ベース)となる変数を定義
> formula_x_base = "VOUCH0"
####従属変数(目的変数)のリストを定義。1つずつ値を取り出して、後に定義する重回帰式の左辺に定義します。
> dependent_value_list <- c("TOTSCYRS", "INSCHL", "PRSCH_C", "USNGSCH", "PRSCHA_1", "FINISH6")
>
> print(dependent_value_list)
[1] "TOTSCYRS" "INSCHL" "PRSCH_C" "USNGSCH" "PRSCHA_1" "FINISH6"
>
####単回帰式を定義
右辺("~"の右側)には、従属変数(目的変数)リストから値を1つだけ取り出して配置します。
> reg_formula_list <- paste(dependent_value_list, "~", formula_x_base)
> print(reg_formula_list)
[1] "TOTSCYRS ~ VOUCH0" "INSCHL ~ VOUCH0" "PRSCH_C ~ VOUCH0" "USNGSCH ~ VOUCH0" "PRSCHA_1 ~ VOUCH0"
[6] "FINISH6 ~ VOUCH0"
>
####単回帰式に名前を付ける。
> names(reg_formula_list) <- paste(dependent_value_list, "base", sep="_")
> print(reg_formula_list)
TOTSCYRS_base INSCHL_base PRSCH_C_base USNGSCH_base PRSCHA_1_base
"TOTSCYRS ~ VOUCH0" "INSCHL ~ VOUCH0" "PRSCH_C ~ VOUCH0" "USNGSCH ~ VOUCH0" "PRSCHA_1 ~ VOUCH0"
FINISH6_base
"FINISH6 ~ VOUCH0"
>
右辺("~"の右側)には、従属変数(目的変数)リストから値を1つだけ取り出して配置します。
####重回帰式を定義
> all_reg_formula_list <- paste(dependent_value_list, "~", formula_x_base, "+", lm_independent_values_expression)
>
> print(all_reg_formula_list)
[1] "TOTSCYRS ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
[2] "INSCHL ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
[3] "PRSCH_C ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
[4] "USNGSCH ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
[5] "PRSCHA_1 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
[6] "FINISH6 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
>
####単回帰式と重回帰式(記述したリテラル値)をまとめる。
単回帰式にだけ、名前が付いています。
> table_formula <- c(reg_formula_list, all_reg_formula_list)
> print(table_formula)
TOTSCYRS_base
"TOTSCYRS ~ VOUCH0"
INSCHL_base
"INSCHL ~ VOUCH0"
PRSCH_C_base
"PRSCH_C ~ VOUCH0"
USNGSCH_base
"USNGSCH ~ VOUCH0"
PRSCHA_1_base
"PRSCHA_1 ~ VOUCH0"
FINISH6_base
"FINISH6 ~ VOUCH0"
"TOTSCYRS ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
"INSCHL ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
"PRSCH_C ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
"USNGSCH ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
"PRSCHA_1 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
"FINISH6 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3"
>
####データ型をDataFrame型に変換
> models <- table_formula %>% enframe(name = "model_index", value = "formula")
> print(models)
# A tibble: 12 x 2
model_index formula
<chr> <chr>
1 "TOTSCYRS_base" TOTSCYRS ~ VOUCH0
2 "INSCHL_base" INSCHL ~ VOUCH0
3 "PRSCH_C_base" PRSCH_C ~ VOUCH0
4 "USNGSCH_base" USNGSCH ~ VOUCH0
5 "PRSCHA_1_base" PRSCHA_1 ~ VOUCH0
6 "FINISH6_base" FINISH6 ~ VOUCH0
7 "" TOTSCYRS ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3
8 "" INSCHL ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3
9 "" PRSCH_C ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3
10 "" USNGSCH ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3
11 "" PRSCHA_1 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3
12 "" FINISH6 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3
>
####回帰式を適用する適用先のデータを定義
> regression_data <- vouchers %>% filter(TAB3SMPL == 1, BOG95SMP == 1)
> head(regression_data)
# A tibble: 6 x 89
ID BOG95SMP BOG97SMP JAM93SMP SEX AGE AGE2 HSVISIT SCYFNSH INSCHL PRSCH_C PRSCHA_1 PRSCHA_2 VOUCH0 BOG95ASD
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 1 0 0 0 14 12 0 8 1 1 1 1 1 1
2 4 1 0 0 1 14 12 0 8 1 1 1 1 0 1
3 5 1 0 0 0 14 12 0 8 1 0 1 0 0 1
4 6 1 0 0 0 12 10 0 7 1 0 1 1 0 1
5 10 1 0 0 1 14 11 0 8 1 0 0 0 1 1
6 11 1 0 0 0 14 12 0 8 1 0 0 0 1 1
# … with 74 more variables: BOG97ASD <dbl>, JAM93ASD <dbl>, DBOGOTA <dbl>, DJAMUNDI <dbl>, D1995 <dbl>, D1997 <dbl>,
# RESPONSE <dbl>, TEST_TAK <dbl>, SEX_NAME <dbl>, SVY <dbl>, D1993 <dbl>, PHONE <dbl>, DAREA1 <dbl>, DAREA2 <dbl>,
# DAREA3 <dbl>, DAREA4 <dbl>, DAREA5 <dbl>, DAREA6 <dbl>, DAREA7 <dbl>, DAREA8 <dbl>, DAREA9 <dbl>, DAREA10 <dbl>,
# DAREA11 <dbl>, DAREA12 <dbl>, DAREA13 <dbl>, DAREA14 <dbl>, DAREA15 <dbl>, DAREA16 <dbl>, DAREA17 <dbl>,
# DAREA18 <dbl>, DAREA19 <dbl>, DMONTH1 <dbl>, DMONTH2 <dbl>, DMONTH3 <dbl>, DMONTH4 <dbl>, DMONTH5 <dbl>,
# DMONTH6 <dbl>, DMONTH7 <dbl>, DMONTH8 <dbl>, DMONTH9 <dbl>, DMONTH10 <dbl>, DMONTH11 <dbl>, DMONTH12 <dbl>,
# BOG95 <dbl>, BOG97 <dbl>, MOM_SCH <dbl>, MOM_AGE <dbl>, MOM_MW <dbl>, DAD_SCH <dbl>, DAD_AGE <dbl>, DAD_MW <dbl>,
# SEX2 <dbl>, STRATA1 <dbl>, STRATA2 <dbl>, STRATA3 <dbl>, STRATA4 <dbl>, STRATA5 <dbl>, STRATA6 <dbl>,
# STRATAMS <dbl>, REPT6 <dbl>, TOTSCYRS <dbl>, HASCHILD <dbl>, MARRIED <dbl>, WORKING <dbl>, REPT <dbl>,
# NREPT <dbl>, FINISH6 <dbl>, FINISH7 <dbl>, FINISH8 <dbl>, SEX_MISS <dbl>, USNGSCH <dbl>, HOURSUM <dbl>,
# TAB3SMPL <dbl>, WORKING3 <dbl>
>
####*map()*関数__を使って、データセットに対して、複数の回帰式を一括適用
-適用する値(map関数の第一引数):models$formulaに格納された__複数の回帰式
-適用する関数(map関数の第二引数):*lm()*関数__。lm()には、models$formulaから回帰式を1つずつ取り出して与える。
-適用対象(map関数の第三引数):regression_data
> df_models <- models %>% mutate(model = map(.x = formula, .f = lm, data = regression_data)) %>% mutate(lm_result = map(.x = model, .f = tidy))
>
####__*map()*関数の実行結果__を表示。
> print(df_models)
# A tibble: 12 x 4
model_index formula model lm_result
<chr> <chr> <list> <list>
1 "TOTSCYRS_base" TOTSCYRS ~ VOUCH0 <lm> <tibble [2 × 5]>
2 "INSCHL_base" INSCHL ~ VOUCH0 <lm> <tibble [2 × 5]>
3 "PRSCH_C_base" PRSCH_C ~ VOUCH0 <lm> <tibble [2 × 5]>
4 "USNGSCH_base" USNGSCH ~ VOUCH0 <lm> <tibble [2 × 5]>
5 "PRSCHA_1_base" PRSCHA_1 ~ VOUCH0 <lm> <tibble [2 × 5]>
6 "FINISH6_base" FINISH6 ~ VOUCH0 <lm> <tibble [2 × 5]>
7 "" TOTSCYRS ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3 <lm> <tibble [8 × 5]>
8 "" INSCHL ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3 <lm> <tibble [8 × 5]>
9 "" PRSCH_C ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3 <lm> <tibble [8 × 5]>
10 "" USNGSCH ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3 <lm> <tibble [8 × 5]>
11 "" PRSCHA_1 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3 <lm> <tibble [8 × 5]>
12 "" FINISH6 ~ VOUCH0 + SVY + HSVISIT + AGE + STRATA1 + STRATA2 + STRATA3 <lm> <tibble [8 × 5]>
>
> print(df_models$model)
[[1]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0
3.65302 0.05809
[[2]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0
0.83096 0.01861
[[3]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0
0.5391 0.1600
[[4]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0
0.05694 0.50887
[[5]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0
0.87722 0.06295
[[6]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0
0.94306 0.02617
[[7]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0 SVY HSVISIT AGE STRATA1 STRATA2 STRATA3
6.22680 0.04594 0.05080 0.05236 -0.17671 -0.07833 0.10382 0.14800
[[8]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0 SVY HSVISIT AGE STRATA1 STRATA2 STRATA3
2.262204 0.011583 -0.007225 0.051464 -0.098327 -0.003393 0.065811 0.062946
[[9]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0 SVY HSVISIT AGE STRATA1 STRATA2 STRATA3
1.70532 0.15482 0.01149 0.03878 -0.08108 -0.06651 0.08655 0.06113
[[10]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0 SVY HSVISIT AGE STRATA1 STRATA2 STRATA3
0.74951 0.50564 -0.04750 0.02388 -0.05027 0.06116 0.10749 0.01893
[[11]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0 SVY HSVISIT AGE STRATA1 STRATA2 STRATA3
1.17091 0.06190 -0.01092 0.05269 -0.01795 -0.02522 -0.02327 -0.05940
[[12]]
Call:
.f(formula = .x[[i]], data = ..1)
Coefficients:
(Intercept) VOUCH0 SVY HSVISIT AGE STRATA1 STRATA2 STRATA3
1.2825853 0.0246813 0.0062274 0.0220922 -0.0223925 -0.0293528 -0.0026688 -0.0001548
>
> print(df_models$lm_result)
[[1]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.65 0.0374 97.7 0
2 VOUCH0 0.0581 0.0524 1.11 0.267
[[2]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.831 0.0155 53.8 1.64e-315
2 VOUCH0 0.0186 0.0216 0.860 3.90e- 1
[[3]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.539 0.0202 26.7 2.21e-122
2 VOUCH0 0.160 0.0283 5.66 1.96e- 8
[[4]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.0569 0.0164 3.46 5.52e- 4
2 VOUCH0 0.509 0.0230 22.1 1.80e-90
[[5]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.877 0.0120 72.8 0
2 VOUCH0 0.0629 0.0169 3.73 0.000200
[[6]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.943 0.00860 110. 0
2 VOUCH0 0.0262 0.0120 2.17 0.0300
[[7]]
# A tibble: 8 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 6.23 0.298 20.9 2.12e-82
2 VOUCH0 0.0459 0.0504 0.912 3.62e- 1
3 SVY 0.0508 0.0607 0.837 4.03e- 1
4 HSVISIT 0.0524 0.111 0.470 6.38e- 1
5 AGE -0.177 0.0191 -9.26 9.79e-20
6 STRATA1 -0.0783 0.0911 -0.860 3.90e- 1
7 STRATA2 0.104 0.0715 1.45 1.46e- 1
8 STRATA3 0.148 0.0941 1.57 1.16e- 1
[[8]]
# A tibble: 8 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.26 0.119 19.0 2.27e-70
2 VOUCH0 0.0116 0.0201 0.576 5.64e- 1
3 SVY -0.00722 0.0242 -0.298 7.66e- 1
4 HSVISIT 0.0515 0.0444 1.16 2.47e- 1
5 AGE -0.0983 0.00761 -12.9 1.03e-35
6 STRATA1 -0.00339 0.0363 -0.0934 9.26e- 1
7 STRATA2 0.0658 0.0285 2.31 2.11e- 2
8 STRATA3 0.0629 0.0376 1.68 9.40e- 2
[[9]]
# A tibble: 8 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.71 0.162 10.5 8.49e-25
2 VOUCH0 0.155 0.0274 5.65 2.00e- 8
3 SVY 0.0115 0.0330 0.348 7.28e- 1
4 HSVISIT 0.0388 0.0605 0.641 5.22e- 1
5 AGE -0.0811 0.0104 -7.81 1.25e-14
6 STRATA1 -0.0665 0.0495 -1.34 1.79e- 1
7 STRATA2 0.0866 0.0389 2.23 2.61e- 2
8 STRATA3 0.0611 0.0512 1.19 2.33e- 1
[[10]]
# A tibble: 8 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.750 0.133 5.63 2.30e- 8
2 VOUCH0 0.506 0.0225 22.5 1.03e-92
3 SVY -0.0475 0.0271 -1.75 8.05e- 2
4 HSVISIT 0.0239 0.0498 0.480 6.32e- 1
5 AGE -0.0503 0.00853 -5.89 5.04e- 9
6 STRATA1 0.0612 0.0407 1.50 1.33e- 1
7 STRATA2 0.107 0.0319 3.36 7.93e- 4
8 STRATA3 0.0189 0.0421 0.450 6.53e- 1
[[11]]
# A tibble: 8 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.17 0.0996 11.8 3.21e-30
2 VOUCH0 0.0619 0.0168 3.68 2.47e- 4
3 SVY -0.0109 0.0203 -0.538 5.91e- 1
4 HSVISIT 0.0527 0.0372 1.42 1.57e- 1
5 AGE -0.0180 0.00638 -2.82 4.96e- 3
6 STRATA1 -0.0252 0.0304 -0.829 4.08e- 1
7 STRATA2 -0.0233 0.0239 -0.974 3.30e- 1
8 STRATA3 -0.0594 0.0315 -1.89 5.93e- 2
[[12]]
# A tibble: 8 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.28 0.0706 18.2 6.11e-65
2 VOUCH0 0.0247 0.0119 2.07 3.90e- 2
3 SVY 0.00623 0.0144 0.433 6.65e- 1
4 HSVISIT 0.0221 0.0264 0.837 4.03e- 1
5 AGE -0.0224 0.00452 -4.95 8.54e- 7
6 STRATA1 -0.0294 0.0216 -1.36 1.74e- 1
7 STRATA2 -0.00267 0.0169 -0.158 8.75e- 1
8 STRATA3 -0.000155 0.0223 -0.00694 9.94e- 1