More than 3 years have passed since last update.

R言語のパイプ演算子（%>%）とmap関数とbroomパッケージの組合せ技で、3本の回帰分析式をDataFrame型に一括適用する

Last updated at 2020-12-31Posted at 2020-12-31

安井翔太（著）『効果検証入門』（技術評論社）を読んでいて、Rのパイプ関数（%>%）が使い勝手が良さそうだったので、写経して使ってみた。

（関連記事）

【 R言語】大量の回帰式のmap一括実行と、解析結果（偏回帰係数とt値・p値）の格納列のmutateによる生成を%>%パイプで一筆書き

実行環境

RStudio
OS: macOS Catalina

なお、別の記事では、この記事と同じコードを、Google Colaboratory上でも実行してみました。
詳細は次の記事を参照ください。

Google ColaboratoryでR言語を動かした（GPU選択）

パッケージのインストールと読込み

RStudio

> install.packages("tidyverse")
（ ・・・省略・・・　）
>

RStudio

> library("tidyverse")
─ Attaching packages ───────────────────── tidyverse 1.3.0 ─
✓ ggplot2 3.3.3     ✓ purrr   0.3.4
✓ tibble  3.0.4     ✓ dplyr   1.0.2
✓ tidyr   1.1.2     ✓ stringr 1.4.0
✓ readr   1.4.0     ✓ forcats 0.5.0
─ Conflicts ────────────────────── tidyverse_conflicts() ─
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
>

データセットのダウンロードと読込み

RStudio

> csv_url = "http://www.minethatdata.com/Kevin_Hillstrom_MineThatData_E-MailAnalytics_DataMiningChallenge_2008.03.20.csv"
> email_data <- read_csv(csv_url)

─ Column specification ─────────────────────────────
cols(
  recency = col_double(),
  history_segment = col_character(),
  history = col_double(),
  mens = col_double(),
  womens = col_double(),
  zip_code = col_character(),
  newbie = col_double(),
  channel = col_character(),
  segment = col_character(),
  visit = col_double(),
  conversion = col_double(),
  spend = col_double()
)

>

データのカラム構造と分布の確認

RStudio

> summary(email_data)
    recency       history_segment       history             mens      
 Min.   : 1.000   Length:64000       Min.   :  29.99   Min.   :0.000  
 1st Qu.: 2.000   Class :character   1st Qu.:  64.66   1st Qu.:0.000  
 Median : 6.000   Mode  :character   Median : 158.11   Median :1.000  
 Mean   : 5.764                      Mean   : 242.09   Mean   :0.551  
 3rd Qu.: 9.000                      3rd Qu.: 325.66   3rd Qu.:1.000  
 Max.   :12.000                      Max.   :3345.93   Max.   :1.000  
     womens         zip_code             newbie         channel         
 Min.   :0.0000   Length:64000       Min.   :0.0000   Length:64000      
 1st Qu.:0.0000   Class :character   1st Qu.:0.0000   Class :character  
 Median :1.0000   Mode  :character   Median :1.0000   Mode  :character  
 Mean   :0.5497                      Mean   :0.5022                     
 3rd Qu.:1.0000                      3rd Qu.:1.0000                     
 Max.   :1.0000                      Max.   :1.0000                     
   segment              visit          conversion           spend        
 Length:64000       Min.   :0.0000   Min.   :0.000000   Min.   :  0.000  
 Class :character   1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:  0.000  
 Mode  :character   Median :0.0000   Median :0.000000   Median :  0.000  
                    Mean   :0.1468   Mean   :0.009031   Mean   :  1.051  
                    3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:  0.000  
                    Max.   :1.0000   Max.   :1.000000   Max.   :499.000  
>

RStudio

> head(email_data)
# A tibble: 6 x 12
  recency history_segment history  mens womens zip_code newbie channel segment
    <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
1      10 2) $100 - $200    142.      1      0 Surburb…      0 Phone   Womens…
2       6 3) $200 - $350    329.      1      1 Rural         1 Web     No E-M…
3       7 2) $100 - $200    181.      0      1 Surburb…      1 Web     Womens…
4       9 5) $500 - $750    676.      1      0 Rural         1 Web     Mens E…
5       2 1) $0 - $100       45.3     1      0 Urban         0 Web     Womens…
6       6 2) $100 - $200    135.      0      1 Surburb…      0 Phone   Womens…
# … with 3 more variables: visit <dbl>, conversion <dbl>, spend <dbl>
>

RStudio

> tail(email_data)
# A tibble: 6 x 12
  recency history_segment history  mens womens zip_code newbie channel segment
    <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
1       7 1) $0 - $100       86.5     0      1 Urban         0 Web     Mens E…
2      10 2) $100 - $200    106.      1      0 Urban         0 Web     Mens E…
3       5 1) $0 - $100       38.9     0      1 Urban         1 Phone   Mens E…
4       6 1) $0 - $100       30.0     1      0 Urban         1 Phone   Mens E…
5       1 5) $500 - $750    553.      1      0 Surburb…      1 Multic… Womens…
6       1 4) $350 - $500    473.      0      1 Surburb…      0 Web     Mens E…
# … with 3 more variables: visit <dbl>, conversion <dbl>, spend <dbl>
>

__email_dataをデータ加工する処理を、%>%__記法でパイプで繋いで、実行

なお、__"+"__は、RStudioでコードの途中で改行した際に付される記号です。
コードに*"+"*を記述する必要はありません。

RStudio

> male_df <- email_data %>%
+ filter(segment != "Womens E-Mail")
>

RStudio

> male_df
# A tibble: 42,613 x 12
   recency history_segment history  mens womens zip_code newbie channel segment
     <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
 1       6 3) $200 - $350    329.      1      1 Rural         1 Web     No E-M…
 2       9 5) $500 - $750    676.      1      0 Rural         1 Web     Mens E…
 3       9 5) $500 - $750    675.      1      1 Rural         1 Phone   Mens E…
 4       2 2) $100 - $200    102.      0      1 Urban         0 Web     Mens E…
 5       4 3) $200 - $350    241.      0      1 Rural         1 Multic… No E-M…
 6       3 1) $0 - $100       58.1     1      0 Urban         1 Web     No E-M…
 7       5 1) $0 - $100       30.0     1      0 Surburb…      0 Phone   Mens E…
 8       9 2) $100 - $200    112.      1      0 Rural         0 Web     Mens E…
 9      11 3) $200 - $350    219.      1      1 Surburb…      0 Phone   Mens E…
10       5 6) $750 - $1,0…   828.      1      0 Surburb…      1 Multic… Mens E…
# … with 42,603 more rows, and 3 more variables: visit <dbl>, conversion <dbl>,
#   spend <dbl>
>

RStudio

> male_df <- email_data %>%
+ filter(segment != "Womens E-Mail") %>%
+ mutate(treatment = if_else(segment == "Mens E-Mail", 1, 0))
>

RStudio

> summary(male_df)
    recency       history_segment       history             mens       
 Min.   : 1.000   Length:42613       Min.   :  29.99   Min.   :0.0000  
 1st Qu.: 2.000   Class :character   1st Qu.:  64.50   1st Qu.:0.0000  
 Median : 6.000   Mode  :character   Median : 157.00   Median :1.0000  
 Mean   : 5.762                      Mean   : 241.86   Mean   :0.5521  
 3rd Qu.: 9.000                      3rd Qu.: 325.21   3rd Qu.:1.0000  
 Max.   :12.000                      Max.   :3345.93   Max.   :1.0000  
     womens         zip_code             newbie         channel         
 Min.   :0.0000   Length:42613       Min.   :0.0000   Length:42613      
 1st Qu.:0.0000   Class :character   1st Qu.:0.0000   Class :character  
 Median :1.0000   Mode  :character   Median :1.0000   Mode  :character  
 Mean   :0.5495                      Mean   :0.5017                     
 3rd Qu.:1.0000                      3rd Qu.:1.0000                     
 Max.   :1.0000                      Max.   :1.0000                     
   segment              visit          conversion           spend        
 Length:42613       Min.   :0.0000   Min.   :0.000000   Min.   :  0.000  
 Class :character   1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:  0.000  
 Mode  :character   Median :0.0000   Median :0.000000   Median :  0.000  
                    Mean   :0.1445   Mean   :0.009129   Mean   :  1.038  
                    3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:  0.000  
                    Max.   :1.0000   Max.   :1.000000   Max.   :499.000  
   treatment  
 Min.   :0.0  
 1st Qu.:0.0  
 Median :1.0  
 Mean   :0.5  
 3rd Qu.:1.0  
 Max.   :1.0  
> 
> head(male_df)
# A tibble: 6 x 13
  recency history_segment history  mens womens zip_code newbie channel segment
    <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
1       6 3) $200 - $350    329.      1      1 Rural         1 Web     No E-M…
2       9 5) $500 - $750    676.      1      0 Rural         1 Web     Mens E…
3       9 5) $500 - $750    675.      1      1 Rural         1 Phone   Mens E…
4       2 2) $100 - $200    102.      0      1 Urban         0 Web     Mens E…
5       4 3) $200 - $350    241.      0      1 Rural         1 Multic… No E-M…
6       3 1) $0 - $100       58.1     1      0 Urban         1 Web     No E-M…
# … with 4 more variables: visit <dbl>, conversion <dbl>, spend <dbl>,
#   treatment <dbl>
> head(male_df$treatment)
[1] 0 1 1 1 0 0
> 
> tail(male_df$treatment)
[1] 1 1 1 1 1 1
> 
> male_df$treatment
   [1] 0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 0
  [39] 0 0 1 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1
  [77] 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 1 1 0 1 1 0 1 1
 [115] 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0
 [153] 0 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 0
 [191] 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0
 [229] 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 1 1 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 0
 [267] 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0
 [305] 0 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0
 [343] 0 0 0 1 0 1 0 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0
 [381] 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0
 [419] 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0
 [457] 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 1 0 1 1
 [495] 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 1 0
 [533] 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 1 1 1 0 1 1
 [571] 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 0 0
 [609] 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0
 [647] 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0
 [685] 1 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1
 [723] 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 0 1 0 0 0 1 1 1
 [761] 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 1
 [799] 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1
 [837] 0 0 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 0 0 1
 [875] 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0
 [913] 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0
 [951] 0 0 0 0 1 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1
 [989] 1 0 0 0 0 1 0 0 1 1 1 1
 [ reached getOption("max.print") -- omitted 41613 entries ]
>

RStudio

> summary_by_segment <- male_df %>% group_by(treatment) %>%
+ summarise(conversion_rate = mean(conversion), spend_mean = mean(spend), count = n())
`summarise()` ungrouping output (override with `.groups` argument)
>

RStudio

> summary(summary_by_segment)
   treatment    conversion_rate      spend_mean         count      
 Min.   :0.00   Min.   :0.005726   Min.   :0.6528   Min.   :21306  
 1st Qu.:0.25   1st Qu.:0.007427   1st Qu.:0.8452   1st Qu.:21306  
 Median :0.50   Median :0.009129   Median :1.0377   Median :21306  
 Mean   :0.50   Mean   :0.009129   Mean   :1.0377   Mean   :21306  
 3rd Qu.:0.75   3rd Qu.:0.010830   3rd Qu.:1.2302   3rd Qu.:21307  
 Max.   :1.00   Max.   :0.012531   Max.   :1.4226   Max.   :21307  
> 
> summary_by_segment
# A tibble: 2 x 4
  treatment conversion_rate spend_mean count
      <dbl>           <dbl>      <dbl> <int>
1         0         0.00573      0.653 21306
2         1         0.0125       1.42  21307
> 
>

broomパッケージを使うことで、複数本の回帰分析モデルを一括実行することができる。

まず最初に、複数本の重回帰モデル式を配列に格納する。

RStudio

> library(broom)
> formulae_vec <- c(spend ~ treatment + recency + channel,
+                   spend ~ treatment + recency + channel + history,
+                   history ~ treatment + recency + channel)
> 
> 
> names(formulae_vec) <- paste("reg", LETTERS[1:3], sep="_")

RStudio

> formulae_vec
$reg_A
spend ~ treatment + recency + channel

$reg_B
spend ~ treatment + recency + channel + history

$reg_C
history ~ treatment + recency + channel

>

RStudio

> names(formulae_vec)
[1] "reg_A" "reg_B" "reg_C"
>

配列をDataFrame型に型変換する。

RStudio

> models <- formulae_vec %>% enframe(name = "model_index", value="formula")

RStudio

> models
# A tibble: 3 x 2
  model_index formula  
  <chr>       <list>   
1 reg_A       <formula>
2 reg_B       <formula>
3 reg_C       <formula>
> 
> models$formula
[[1]]
spend ~ treatment + recency + channel

[[2]]
spend ~ treatment + recency + channel + history

[[3]]
history ~ treatment + recency + channel

>

modelsデータフレームオブジェクト__のformula列__に格納されている3本の重回帰モデルの定義式を、map関数__を使って、lm()関数に渡した上で、male_dfオブジェクト__に対して関数適用する。以上を、__*%>%*パイプ演算子__を使ってパイプライン形式で順番に実行する。
出力された結果のオプジェクトを、__$%>%$パイプ演算子__を使って、__tidy()関数__に渡して、データの出力書式を見やすい形に整形する。

RStudio

> df_models <- models %>%
+                         mutate(model = map(.x = formula, .f = lm, data = male_df)) %>%
+                         mutate(lm_result = map(.x = model, .f = tidy))
>

結果

RStudio

> df_models
# A tibble: 3 x 4
  model_index formula   model  lm_result       
  <chr>       <list>    <list> <list>          
1 reg_A       <formula> <lm>   <tibble [5 × 5]>
2 reg_B       <formula> <lm>   <tibble [6 × 5]>
3 reg_C       <formula> <lm>   <tibble [5 × 5]>
>

RStudio

> df_models$lm_result
[[1]]
# A tibble: 5 x 5
  term         estimate std.error statistic     p.value
  <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 (Intercept)    1.17      0.242      4.83  0.00000138 
2 treatment      0.771     0.145      5.31  0.000000112
3 recency       -0.0698    0.0208    -3.35  0.000812   
4 channelPhone  -0.216     0.237     -0.911 0.362      
5 channelWeb    -0.0429    0.236     -0.182 0.856      

[[2]]
# A tibble: 6 x 5
  term         estimate std.error statistic     p.value
  <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 (Intercept)   0.482    0.306        1.57  0.116      
2 treatment     0.768    0.145        5.29  0.000000125
3 recency      -0.0525   0.0214      -2.46  0.0139     
4 channelPhone  0.136    0.256        0.533 0.594      
5 channelWeb    0.307    0.255        1.20  0.229      
6 history       0.00116  0.000318     3.64  0.000272   

[[3]]
# A tibble: 5 x 5
  term         estimate std.error statistic p.value
  <chr>           <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)    593.       3.69     161.     0    
2 treatment        2.72     2.21       1.23   0.220
3 recency        -14.9      0.318    -46.9    0    
4 channelPhone  -304.       3.61     -84.3    0    
5 channelWeb    -302.       3.60     -83.9    0    

>

unnest関数を使う場合

RStudio

> df_results <- df_models %>%
+                             mutate(formula = as.character(formula)) %>%
+                             select(formula, model_index, lm_result) %>%
+                             unnest(cols = c(lm_result))
>

RStudio

> df_results
# A tibble: 16 x 7
   formula                                         model_index term           estimate std.error statistic     p.value
   <chr>                                           <chr>       <chr>             <dbl>     <dbl>     <dbl>       <dbl>
 1 spend ~ treatment + recency + channel           reg_A       (Intercept)     1.17     0.242        4.83  0.00000138 
 2 spend ~ treatment + recency + channel           reg_A       treatment       0.771    0.145        5.31  0.000000112
 3 spend ~ treatment + recency + channel           reg_A       recency        -0.0698   0.0208      -3.35  0.000812   
 4 spend ~ treatment + recency + channel           reg_A       channelPhone   -0.216    0.237       -0.911 0.362      
 5 spend ~ treatment + recency + channel           reg_A       channelWeb     -0.0429   0.236       -0.182 0.856      
 6 spend ~ treatment + recency + channel + history reg_B       (Intercept)     0.482    0.306        1.57  0.116      
 7 spend ~ treatment + recency + channel + history reg_B       treatment       0.768    0.145        5.29  0.000000125
 8 spend ~ treatment + recency + channel + history reg_B       recency        -0.0525   0.0214      -2.46  0.0139     
 9 spend ~ treatment + recency + channel + history reg_B       channelPhone    0.136    0.256        0.533 0.594      
10 spend ~ treatment + recency + channel + history reg_B       channelWeb      0.307    0.255        1.20  0.229      
11 spend ~ treatment + recency + channel + history reg_B       history         0.00116  0.000318     3.64  0.000272   
12 history ~ treatment + recency + channel         reg_C       (Intercept)   593.       3.69       161.    0          
13 history ~ treatment + recency + channel         reg_C       treatment       2.72     2.21         1.23  0.220      
14 history ~ treatment + recency + channel         reg_C       recency       -14.9      0.318      -46.9   0          
15 history ~ treatment + recency + channel         reg_C       channelPhone -304.       3.61       -84.3   0          
16 history ~ treatment + recency + channel         reg_C       channelWeb   -302.       3.60       -83.9   0          

>

RStudio

> df_results %>% filter(model_index == "reg_A")
# A tibble: 5 x 7
  formula                               model_index term         estimate std.error statistic     p.value
  <chr>                                 <chr>       <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 spend ~ treatment + recency + channel reg_A       (Intercept)    1.17      0.242      4.83  0.00000138 
2 spend ~ treatment + recency + channel reg_A       treatment      0.771     0.145      5.31  0.000000112
3 spend ~ treatment + recency + channel reg_A       recency       -0.0698    0.0208    -3.35  0.000812   
4 spend ~ treatment + recency + channel reg_A       channelPhone  -0.216     0.237     -0.911 0.362      
5 spend ~ treatment + recency + channel reg_A       channelWeb    -0.0429    0.236     -0.182 0.856      
>

RStudio

> df_results %>% filter(model_index == "reg_B")
# A tibble: 6 x 7
  formula                                         model_index term         estimate std.error statistic     p.value
  <chr>                                           <chr>       <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 spend ~ treatment + recency + channel + history reg_B       (Intercept)   0.482    0.306        1.57  0.116      
2 spend ~ treatment + recency + channel + history reg_B       treatment     0.768    0.145        5.29  0.000000125
3 spend ~ treatment + recency + channel + history reg_B       recency      -0.0525   0.0214      -2.46  0.0139     
4 spend ~ treatment + recency + channel + history reg_B       channelPhone  0.136    0.256        0.533 0.594      
5 spend ~ treatment + recency + channel + history reg_B       channelWeb    0.307    0.255        1.20  0.229      
6 spend ~ treatment + recency + channel + history reg_B       history       0.00116  0.000318     3.64  0.000272   
>

RStudio

> df_results %>% filter(model_index == "reg_C")
# A tibble: 5 x 7
  formula                                 model_index term         estimate std.error statistic p.value
  <chr>                                   <chr>       <chr>           <dbl>     <dbl>     <dbl>   <dbl>
1 history ~ treatment + recency + channel reg_C       (Intercept)    593.       3.69     161.     0    
2 history ~ treatment + recency + channel reg_C       treatment        2.72     2.21       1.23   0.220
3 history ~ treatment + recency + channel reg_C       recency        -14.9      0.318    -46.9    0    
4 history ~ treatment + recency + channel reg_C       channelPhone  -304.       3.61     -84.3    0    
5 history ~ treatment + recency + channel reg_C       channelWeb    -302.       3.60     -83.9    0    
>

ファイル出力

ここからは、「効果検証入門」の本にはありませんが、参考まで。

RStudio

> linear_regression_1 <- lm(spend ~ treatment + recency + channel, data = male_df)
> linear_regression_2 <- lm(spend ~ treatment + recency + channel + history, data = male_df)

以下を実行すると、Desktop上にhtmlファイルと$LaTeX$ファイルが出力される。

RStudio

> getwd()
[1] "/Users/ocean"
> 
> setwd("/Users/ocean/Desktop/")
> getwd()
[1] "/Users/ocean/Desktop"
>

RStudio

> install.packages("stargazer")
 URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/stargazer_5.2.2.tgz' を試しています 
Content type 'application/x-gzip' length 619830 bytes (605 KB)
==================================================
downloaded 605 KB


The downloaded binary packages are in
	/var/folders/dw/kq_8pwps5s771nklvc8krws40000gn/T//RtmpwaZ17t/downloaded_packages
>

RStudio

> library(stargazer)

Please cite as: 

 Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.2. https://CRAN.R-project.org/package=stargazer 

>

htmlファイルへの書き出し

RStudio

> stargazer(linear_regression_1, linear_regression_2, title = "線形重回帰分析", type="html", out="重回帰分析.html")

<table style="text-align:center"><caption><strong>線形重回帰分析</strong></caption>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td colspan="2">spend</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">treatment</td><td>0.771<sup>***</sup></td><td>0.768<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.145)</td><td>(0.145)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">recency</td><td>-0.070<sup>***</sup></td><td>-0.053<sup>**</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.021)</td><td>(0.021)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">channelPhone</td><td>-0.216</td><td>0.136</td></tr>
<tr><td style="text-align:left"></td><td>(0.237)</td><td>(0.256)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">channelWeb</td><td>-0.043</td><td>0.307</td></tr>
<tr><td style="text-align:left"></td><td>(0.236)</td><td>(0.255)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">history</td><td></td><td>0.001<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.0003)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>1.167<sup>***</sup></td><td>0.482</td></tr>
<tr><td style="text-align:left"></td><td>(0.242)</td><td>(0.306)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>42,613</td><td>42,613</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.001</td><td>0.001</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.001</td><td>0.001</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>14.990 (df = 42608)</td><td>14.988 (df = 42607)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>10.345<sup>***</sup> (df = 4; 42608)</td><td>10.930<sup>***</sup> (df = 5; 42607)</td></tr>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
>

LaTeXファイルへの書き出し

RStudio

> stargazer(linear_regression_1, linear_regression_2, title = "線形重回帰分析", type="latex", out="重回帰分析.tex")

% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: 木, 12 31, 2020 - 17時45分38秒
\begin{table}[!htbp] \centering 
  \caption{線形重回帰分析} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lcc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{2}{c}{\textit{Dependent variable:}} \\ 
\cline{2-3} 
\\[-1.8ex] & \multicolumn{2}{c}{spend} \\ 
\\[-1.8ex] & (1) & (2)\\ 
\hline \\[-1.8ex] 
 treatment & 0.771$^{***}$ & 0.768$^{***}$ \\ 
  & (0.145) & (0.145) \\ 
  & & \\ 
 recency & $-$0.070$^{***}$ & $-$0.053$^{**}$ \\ 
  & (0.021) & (0.021) \\ 
  & & \\ 
 channelPhone & $-$0.216 & 0.136 \\ 
  & (0.237) & (0.256) \\ 
  & & \\ 
 channelWeb & $-$0.043 & 0.307 \\ 
  & (0.236) & (0.255) \\ 
  & & \\ 
 history &  & 0.001$^{***}$ \\ 
  &  & (0.0003) \\ 
  & & \\ 
 Constant & 1.167$^{***}$ & 0.482 \\ 
  & (0.242) & (0.306) \\ 
  & & \\ 
\hline \\[-1.8ex] 
Observations & 42,613 & 42,613 \\ 
R$^{2}$ & 0.001 & 0.001 \\ 
Adjusted R$^{2}$ & 0.001 & 0.001 \\ 
Residual Std. Error & 14.990 (df = 42608) & 14.988 (df = 42607) \\ 
F Statistic & 10.345$^{***}$ (df = 4; 42608) & 10.930$^{***}$ (df = 5; 42607) \\ 
\hline 
\hline \\[-1.8ex] 
\textit{Note:}  & \multicolumn{2}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
\end{tabular} 
\end{table}

Terminalでファイルが生成されていることを確認

Terminal

ocean@AfoGuardMacBook-Pro Desktop % ls | grep tex
mactex.txt
重回帰分析.tex
ocean@AfoGuardMacBook-Pro Desktop % ls | grep html
重回帰分析.html
ocean@AfoGuardMacBook-Pro Desktop %

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

R言語のパイプ演算子（%>%）とmap関数とbroomパッケージの組合せ技で、3本の回帰分析式をDataFrame型に一括適用する

（ 関連記事 ）

実行環境

パッケージのインストールと読込み

データセットのダウンロードと読込み

データのカラム構造と分布の確認

__email_dataをデータ加工する処理を、%>%__記法でパイプで繋いで、実行

__broomパッケージ__を使うことで、複数本の回帰分析モデルを一括実行することができる。

結果

unnest関数を使う場合

ファイル出力

以下を実行すると、Desktop上にhtmlファイルと$LaTeX$ファイルが出力される。

（関連記事）

broomパッケージを使うことで、複数本の回帰分析モデルを一括実行することができる。