3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

R言語のパイプ演算子(%>%)とmap関数とbroomパッケージの組合せ技で、3本の回帰分析式をDataFrame型に一括適用する

Last updated at Posted at 2020-12-31

安井 翔太(著)『効果検証入門』(技術評論社)を読んでいて、Rのパイプ関数(%>%)が使い勝手が良さそうだったので、写経して使ってみた。

###( 関連記事 )

#実行環境

  • RStudio
  • OS: macOS Catalina

なお、別の記事では、この記事と同じコードを、Google Colaboratory上でも実行してみました。
詳細は次の記事を参照ください。


##パッケージのインストールと読込み

RStudio
> install.packages("tidyverse")
 ・・・省略・・・ )
> 
RStudio
> library("tidyverse")
 Attaching packages ───────────────────── tidyverse 1.3.0 
 ggplot2 3.3.3      purrr   0.3.4
 tibble  3.0.4      dplyr   1.0.2
 tidyr   1.1.2      stringr 1.4.0
 readr   1.4.0      forcats 0.5.0
 Conflicts ────────────────────── tidyverse_conflicts() 
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
>

##データセットのダウンロードと読込み

RStudio
> csv_url = "http://www.minethatdata.com/Kevin_Hillstrom_MineThatData_E-MailAnalytics_DataMiningChallenge_2008.03.20.csv"
> email_data <- read_csv(csv_url)

 Column specification ─────────────────────────────
cols(
  recency = col_double(),
  history_segment = col_character(),
  history = col_double(),
  mens = col_double(),
  womens = col_double(),
  zip_code = col_character(),
  newbie = col_double(),
  channel = col_character(),
  segment = col_character(),
  visit = col_double(),
  conversion = col_double(),
  spend = col_double()
)

>

####データのカラム構造と分布の確認

RStudio
> summary(email_data)
    recency       history_segment       history             mens      
 Min.   : 1.000   Length:64000       Min.   :  29.99   Min.   :0.000  
 1st Qu.: 2.000   Class :character   1st Qu.:  64.66   1st Qu.:0.000  
 Median : 6.000   Mode  :character   Median : 158.11   Median :1.000  
 Mean   : 5.764                      Mean   : 242.09   Mean   :0.551  
 3rd Qu.: 9.000                      3rd Qu.: 325.66   3rd Qu.:1.000  
 Max.   :12.000                      Max.   :3345.93   Max.   :1.000  
     womens         zip_code             newbie         channel         
 Min.   :0.0000   Length:64000       Min.   :0.0000   Length:64000      
 1st Qu.:0.0000   Class :character   1st Qu.:0.0000   Class :character  
 Median :1.0000   Mode  :character   Median :1.0000   Mode  :character  
 Mean   :0.5497                      Mean   :0.5022                     
 3rd Qu.:1.0000                      3rd Qu.:1.0000                     
 Max.   :1.0000                      Max.   :1.0000                     
   segment              visit          conversion           spend        
 Length:64000       Min.   :0.0000   Min.   :0.000000   Min.   :  0.000  
 Class :character   1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:  0.000  
 Mode  :character   Median :0.0000   Median :0.000000   Median :  0.000  
                    Mean   :0.1468   Mean   :0.009031   Mean   :  1.051  
                    3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:  0.000  
                    Max.   :1.0000   Max.   :1.000000   Max.   :499.000  
>
RStudio
> head(email_data)
# A tibble: 6 x 12
  recency history_segment history  mens womens zip_code newbie channel segment
    <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
1      10 2) $100 - $200    142.      1      0 Surburb      0 Phone   Womens
2       6 3) $200 - $350    329.      1      1 Rural         1 Web     No E-M
3       7 2) $100 - $200    181.      0      1 Surburb      1 Web     Womens
4       9 5) $500 - $750    676.      1      0 Rural         1 Web     Mens E
5       2 1) $0 - $100       45.3     1      0 Urban         0 Web     Womens
6       6 2) $100 - $200    135.      0      1 Surburb      0 Phone   Womens
# … with 3 more variables: visit <dbl>, conversion <dbl>, spend <dbl>
>
RStudio
> tail(email_data)
# A tibble: 6 x 12
  recency history_segment history  mens womens zip_code newbie channel segment
    <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
1       7 1) $0 - $100       86.5     0      1 Urban         0 Web     Mens E
2      10 2) $100 - $200    106.      1      0 Urban         0 Web     Mens E
3       5 1) $0 - $100       38.9     0      1 Urban         1 Phone   Mens E
4       6 1) $0 - $100       30.0     1      0 Urban         1 Phone   Mens E
5       1 5) $500 - $750    553.      1      0 Surburb      1 Multic Womens
6       1 4) $350 - $500    473.      0      1 Surburb      0 Web     Mens E
# … with 3 more variables: visit <dbl>, conversion <dbl>, spend <dbl>
> 

##__email_dataをデータ加工する処理を、%>%__記法でパイプで繋いで、実行

なお、__"+"__は、RStudioでコードの途中で改行した際に付される記号です。
コードに*"+"*を記述する必要はありません。

RStudio
> male_df <- email_data %>%
+ filter(segment != "Womens E-Mail")
>
RStudio
> male_df
# A tibble: 42,613 x 12
   recency history_segment history  mens womens zip_code newbie channel segment
     <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
 1       6 3) $200 - $350    329.      1      1 Rural         1 Web     No E-M
 2       9 5) $500 - $750    676.      1      0 Rural         1 Web     Mens E
 3       9 5) $500 - $750    675.      1      1 Rural         1 Phone   Mens E
 4       2 2) $100 - $200    102.      0      1 Urban         0 Web     Mens E
 5       4 3) $200 - $350    241.      0      1 Rural         1 Multic No E-M
 6       3 1) $0 - $100       58.1     1      0 Urban         1 Web     No E-M
 7       5 1) $0 - $100       30.0     1      0 Surburb      0 Phone   Mens E
 8       9 2) $100 - $200    112.      1      0 Rural         0 Web     Mens E
 9      11 3) $200 - $350    219.      1      1 Surburb      0 Phone   Mens E
10       5 6) $750 - $1,0   828.      1      0 Surburb      1 Multic Mens E
# … with 42,603 more rows, and 3 more variables: visit <dbl>, conversion <dbl>,
#   spend <dbl>
>
RStudio
> male_df <- email_data %>%
+ filter(segment != "Womens E-Mail") %>%
+ mutate(treatment = if_else(segment == "Mens E-Mail", 1, 0))
>
RStudio
> summary(male_df)
    recency       history_segment       history             mens       
 Min.   : 1.000   Length:42613       Min.   :  29.99   Min.   :0.0000  
 1st Qu.: 2.000   Class :character   1st Qu.:  64.50   1st Qu.:0.0000  
 Median : 6.000   Mode  :character   Median : 157.00   Median :1.0000  
 Mean   : 5.762                      Mean   : 241.86   Mean   :0.5521  
 3rd Qu.: 9.000                      3rd Qu.: 325.21   3rd Qu.:1.0000  
 Max.   :12.000                      Max.   :3345.93   Max.   :1.0000  
     womens         zip_code             newbie         channel         
 Min.   :0.0000   Length:42613       Min.   :0.0000   Length:42613      
 1st Qu.:0.0000   Class :character   1st Qu.:0.0000   Class :character  
 Median :1.0000   Mode  :character   Median :1.0000   Mode  :character  
 Mean   :0.5495                      Mean   :0.5017                     
 3rd Qu.:1.0000                      3rd Qu.:1.0000                     
 Max.   :1.0000                      Max.   :1.0000                     
   segment              visit          conversion           spend        
 Length:42613       Min.   :0.0000   Min.   :0.000000   Min.   :  0.000  
 Class :character   1st Qu.:0.0000   1st Qu.:0.000000   1st Qu.:  0.000  
 Mode  :character   Median :0.0000   Median :0.000000   Median :  0.000  
                    Mean   :0.1445   Mean   :0.009129   Mean   :  1.038  
                    3rd Qu.:0.0000   3rd Qu.:0.000000   3rd Qu.:  0.000  
                    Max.   :1.0000   Max.   :1.000000   Max.   :499.000  
   treatment  
 Min.   :0.0  
 1st Qu.:0.0  
 Median :1.0  
 Mean   :0.5  
 3rd Qu.:1.0  
 Max.   :1.0  
> 
> head(male_df)
# A tibble: 6 x 13
  recency history_segment history  mens womens zip_code newbie channel segment
    <dbl> <chr>             <dbl> <dbl>  <dbl> <chr>     <dbl> <chr>   <chr>  
1       6 3) $200 - $350    329.      1      1 Rural         1 Web     No E-M
2       9 5) $500 - $750    676.      1      0 Rural         1 Web     Mens E
3       9 5) $500 - $750    675.      1      1 Rural         1 Phone   Mens E
4       2 2) $100 - $200    102.      0      1 Urban         0 Web     Mens E
5       4 3) $200 - $350    241.      0      1 Rural         1 Multic No E-M
6       3 1) $0 - $100       58.1     1      0 Urban         1 Web     No E-M
# … with 4 more variables: visit <dbl>, conversion <dbl>, spend <dbl>,
#   treatment <dbl>
> head(male_df$treatment)
[1] 0 1 1 1 0 0
> 
> tail(male_df$treatment)
[1] 1 1 1 1 1 1
> 
> male_df$treatment
   [1] 0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 0
  [39] 0 0 1 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1
  [77] 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 1 1 0 1 1 0 1 1
 [115] 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0
 [153] 0 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 0
 [191] 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0
 [229] 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 1 1 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 0
 [267] 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0
 [305] 0 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0
 [343] 0 0 0 1 0 1 0 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0
 [381] 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0
 [419] 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0
 [457] 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 1 0 1 1
 [495] 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 1 0
 [533] 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 1 1 1 0 1 1
 [571] 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 0 0
 [609] 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0
 [647] 1 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0
 [685] 1 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1
 [723] 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 0 1 0 0 0 1 1 1
 [761] 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 1
 [799] 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1
 [837] 0 0 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 0 0 1
 [875] 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0
 [913] 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0
 [951] 0 0 0 0 1 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1
 [989] 1 0 0 0 0 1 0 0 1 1 1 1
 [ reached getOption("max.print") -- omitted 41613 entries ]
> 
RStudio
> summary_by_segment <- male_df %>% group_by(treatment) %>%
+ summarise(conversion_rate = mean(conversion), spend_mean = mean(spend), count = n())
`summarise()` ungrouping output (override with `.groups` argument)
> 
RStudio
> summary(summary_by_segment)
   treatment    conversion_rate      spend_mean         count      
 Min.   :0.00   Min.   :0.005726   Min.   :0.6528   Min.   :21306  
 1st Qu.:0.25   1st Qu.:0.007427   1st Qu.:0.8452   1st Qu.:21306  
 Median :0.50   Median :0.009129   Median :1.0377   Median :21306  
 Mean   :0.50   Mean   :0.009129   Mean   :1.0377   Mean   :21306  
 3rd Qu.:0.75   3rd Qu.:0.010830   3rd Qu.:1.2302   3rd Qu.:21307  
 Max.   :1.00   Max.   :0.012531   Max.   :1.4226   Max.   :21307  
> 
> summary_by_segment
# A tibble: 2 x 4
  treatment conversion_rate spend_mean count
      <dbl>           <dbl>      <dbl> <int>
1         0         0.00573      0.653 21306
2         1         0.0125       1.42  21307
> 
>

####__broomパッケージ__を使うことで、複数本の回帰分析モデルを一括実行することができる。

まず最初に、複数本の重回帰モデル式を配列に格納する。

RStudio
> library(broom)
> formulae_vec <- c(spend ~ treatment + recency + channel,
+                   spend ~ treatment + recency + channel + history,
+                   history ~ treatment + recency + channel)
> 
> 
> names(formulae_vec) <- paste("reg", LETTERS[1:3], sep="_")
RStudio
> formulae_vec
$reg_A
spend ~ treatment + recency + channel

$reg_B
spend ~ treatment + recency + channel + history

$reg_C
history ~ treatment + recency + channel

>
RStudio
> names(formulae_vec)
[1] "reg_A" "reg_B" "reg_C"
> 

配列をDataFrame型に型変換する。

RStudio
> models <- formulae_vec %>% enframe(name = "model_index", value="formula")
RStudio
> models
# A tibble: 3 x 2
  model_index formula  
  <chr>       <list>   
1 reg_A       <formula>
2 reg_B       <formula>
3 reg_C       <formula>
> 
> models$formula
[[1]]
spend ~ treatment + recency + channel

[[2]]
spend ~ treatment + recency + channel + history

[[3]]
history ~ treatment + recency + channel

>
  • modelsデータフレームオブジェクト__のformula列__に格納されている3本の重回帰モデルの定義式を、map関数__を使って、lm()関数に渡した上で、male_dfオブジェクト__に対して関数適用する。以上を、__*%>%*パイプ演算子__を使ってパイプライン形式で順番に実行する。
  • 出力された結果のオプジェクトを、__$%>%$パイプ演算子__を使って、__tidy()関数__に渡して、データの出力書式を見やすい形に整形する。
RStudio
> df_models <- models %>%
+                         mutate(model = map(.x = formula, .f = lm, data = male_df)) %>%
+                         mutate(lm_result = map(.x = model, .f = tidy))
> 

###結果

RStudio
> df_models
# A tibble: 3 x 4
  model_index formula   model  lm_result       
  <chr>       <list>    <list> <list>          
1 reg_A       <formula> <lm>   <tibble [5 × 5]>
2 reg_B       <formula> <lm>   <tibble [6 × 5]>
3 reg_C       <formula> <lm>   <tibble [5 × 5]>
>
RStudio
> df_models$lm_result
[[1]]
# A tibble: 5 x 5
  term         estimate std.error statistic     p.value
  <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 (Intercept)    1.17      0.242      4.83  0.00000138 
2 treatment      0.771     0.145      5.31  0.000000112
3 recency       -0.0698    0.0208    -3.35  0.000812   
4 channelPhone  -0.216     0.237     -0.911 0.362      
5 channelWeb    -0.0429    0.236     -0.182 0.856      

[[2]]
# A tibble: 6 x 5
  term         estimate std.error statistic     p.value
  <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 (Intercept)   0.482    0.306        1.57  0.116      
2 treatment     0.768    0.145        5.29  0.000000125
3 recency      -0.0525   0.0214      -2.46  0.0139     
4 channelPhone  0.136    0.256        0.533 0.594      
5 channelWeb    0.307    0.255        1.20  0.229      
6 history       0.00116  0.000318     3.64  0.000272   

[[3]]
# A tibble: 5 x 5
  term         estimate std.error statistic p.value
  <chr>           <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)    593.       3.69     161.     0    
2 treatment        2.72     2.21       1.23   0.220
3 recency        -14.9      0.318    -46.9    0    
4 channelPhone  -304.       3.61     -84.3    0    
5 channelWeb    -302.       3.60     -83.9    0    

>

###unnest関数を使う場合

RStudio
> df_results <- df_models %>%
+                             mutate(formula = as.character(formula)) %>%
+                             select(formula, model_index, lm_result) %>%
+                             unnest(cols = c(lm_result))
>
RStudio
> df_results
# A tibble: 16 x 7
   formula                                         model_index term           estimate std.error statistic     p.value
   <chr>                                           <chr>       <chr>             <dbl>     <dbl>     <dbl>       <dbl>
 1 spend ~ treatment + recency + channel           reg_A       (Intercept)     1.17     0.242        4.83  0.00000138 
 2 spend ~ treatment + recency + channel           reg_A       treatment       0.771    0.145        5.31  0.000000112
 3 spend ~ treatment + recency + channel           reg_A       recency        -0.0698   0.0208      -3.35  0.000812   
 4 spend ~ treatment + recency + channel           reg_A       channelPhone   -0.216    0.237       -0.911 0.362      
 5 spend ~ treatment + recency + channel           reg_A       channelWeb     -0.0429   0.236       -0.182 0.856      
 6 spend ~ treatment + recency + channel + history reg_B       (Intercept)     0.482    0.306        1.57  0.116      
 7 spend ~ treatment + recency + channel + history reg_B       treatment       0.768    0.145        5.29  0.000000125
 8 spend ~ treatment + recency + channel + history reg_B       recency        -0.0525   0.0214      -2.46  0.0139     
 9 spend ~ treatment + recency + channel + history reg_B       channelPhone    0.136    0.256        0.533 0.594      
10 spend ~ treatment + recency + channel + history reg_B       channelWeb      0.307    0.255        1.20  0.229      
11 spend ~ treatment + recency + channel + history reg_B       history         0.00116  0.000318     3.64  0.000272   
12 history ~ treatment + recency + channel         reg_C       (Intercept)   593.       3.69       161.    0          
13 history ~ treatment + recency + channel         reg_C       treatment       2.72     2.21         1.23  0.220      
14 history ~ treatment + recency + channel         reg_C       recency       -14.9      0.318      -46.9   0          
15 history ~ treatment + recency + channel         reg_C       channelPhone -304.       3.61       -84.3   0          
16 history ~ treatment + recency + channel         reg_C       channelWeb   -302.       3.60       -83.9   0          

>
RStudio
> df_results %>% filter(model_index == "reg_A")
# A tibble: 5 x 7
  formula                               model_index term         estimate std.error statistic     p.value
  <chr>                                 <chr>       <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 spend ~ treatment + recency + channel reg_A       (Intercept)    1.17      0.242      4.83  0.00000138 
2 spend ~ treatment + recency + channel reg_A       treatment      0.771     0.145      5.31  0.000000112
3 spend ~ treatment + recency + channel reg_A       recency       -0.0698    0.0208    -3.35  0.000812   
4 spend ~ treatment + recency + channel reg_A       channelPhone  -0.216     0.237     -0.911 0.362      
5 spend ~ treatment + recency + channel reg_A       channelWeb    -0.0429    0.236     -0.182 0.856      
> 
RStudio
> df_results %>% filter(model_index == "reg_B")
# A tibble: 6 x 7
  formula                                         model_index term         estimate std.error statistic     p.value
  <chr>                                           <chr>       <chr>           <dbl>     <dbl>     <dbl>       <dbl>
1 spend ~ treatment + recency + channel + history reg_B       (Intercept)   0.482    0.306        1.57  0.116      
2 spend ~ treatment + recency + channel + history reg_B       treatment     0.768    0.145        5.29  0.000000125
3 spend ~ treatment + recency + channel + history reg_B       recency      -0.0525   0.0214      -2.46  0.0139     
4 spend ~ treatment + recency + channel + history reg_B       channelPhone  0.136    0.256        0.533 0.594      
5 spend ~ treatment + recency + channel + history reg_B       channelWeb    0.307    0.255        1.20  0.229      
6 spend ~ treatment + recency + channel + history reg_B       history       0.00116  0.000318     3.64  0.000272   
> 
RStudio
> df_results %>% filter(model_index == "reg_C")
# A tibble: 5 x 7
  formula                                 model_index term         estimate std.error statistic p.value
  <chr>                                   <chr>       <chr>           <dbl>     <dbl>     <dbl>   <dbl>
1 history ~ treatment + recency + channel reg_C       (Intercept)    593.       3.69     161.     0    
2 history ~ treatment + recency + channel reg_C       treatment        2.72     2.21       1.23   0.220
3 history ~ treatment + recency + channel reg_C       recency        -14.9      0.318    -46.9    0    
4 history ~ treatment + recency + channel reg_C       channelPhone  -304.       3.61     -84.3    0    
5 history ~ treatment + recency + channel reg_C       channelWeb    -302.       3.60     -83.9    0    
> 

##ファイル出力

ここからは、「効果検証入門」の本にはありませんが、参考まで。

RStudio
> linear_regression_1 <- lm(spend ~ treatment + recency + channel, data = male_df)
> linear_regression_2 <- lm(spend ~ treatment + recency + channel + history, data = male_df)

####以下を実行すると、Desktop上にhtmlファイルと$LaTeX$ファイルが出力される。

RStudio
> getwd()
[1] "/Users/ocean"
> 
> setwd("/Users/ocean/Desktop/")
> getwd()
[1] "/Users/ocean/Desktop"
> 
RStudio
> install.packages("stargazer")
 URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/stargazer_5.2.2.tgz' を試しています 
Content type 'application/x-gzip' length 619830 bytes (605 KB)
==================================================
downloaded 605 KB


The downloaded binary packages are in
	/var/folders/dw/kq_8pwps5s771nklvc8krws40000gn/T//RtmpwaZ17t/downloaded_packages
>
RStudio
> library(stargazer)

Please cite as: 

 Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.2. https://CRAN.R-project.org/package=stargazer 

> 

htmlファイルへの書き出し

RStudio
> stargazer(linear_regression_1, linear_regression_2, title = "線形重回帰分析", type="html", out="重回帰分析.html")

<table style="text-align:center"><caption><strong>線形重回帰分析</strong></caption>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td colspan="2">spend</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">treatment</td><td>0.771<sup>***</sup></td><td>0.768<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.145)</td><td>(0.145)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">recency</td><td>-0.070<sup>***</sup></td><td>-0.053<sup>**</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.021)</td><td>(0.021)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">channelPhone</td><td>-0.216</td><td>0.136</td></tr>
<tr><td style="text-align:left"></td><td>(0.237)</td><td>(0.256)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">channelWeb</td><td>-0.043</td><td>0.307</td></tr>
<tr><td style="text-align:left"></td><td>(0.236)</td><td>(0.255)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">history</td><td></td><td>0.001<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.0003)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>1.167<sup>***</sup></td><td>0.482</td></tr>
<tr><td style="text-align:left"></td><td>(0.242)</td><td>(0.306)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td></tr>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>42,613</td><td>42,613</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.001</td><td>0.001</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.001</td><td>0.001</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>14.990 (df = 42608)</td><td>14.988 (df = 42607)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>10.345<sup>***</sup> (df = 4; 42608)</td><td>10.930<sup>***</sup> (df = 5; 42607)</td></tr>
<tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
> 

LaTeXファイルへの書き出し

RStudio
> stargazer(linear_regression_1, linear_regression_2, title = "線形重回帰分析", type="latex", out="重回帰分析.tex")

% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: , 12 31, 2020 - 17時45分38秒
\begin{table}[!htbp] \centering 
  \caption{線形重回帰分析} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lcc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{2}{c}{\textit{Dependent variable:}} \\ 
\cline{2-3} 
\\[-1.8ex] & \multicolumn{2}{c}{spend} \\ 
\\[-1.8ex] & (1) & (2)\\ 
\hline \\[-1.8ex] 
 treatment & 0.771$^{***}$ & 0.768$^{***}$ \\ 
  & (0.145) & (0.145) \\ 
  & & \\ 
 recency & $-$0.070$^{***}$ & $-$0.053$^{**}$ \\ 
  & (0.021) & (0.021) \\ 
  & & \\ 
 channelPhone & $-$0.216 & 0.136 \\ 
  & (0.237) & (0.256) \\ 
  & & \\ 
 channelWeb & $-$0.043 & 0.307 \\ 
  & (0.236) & (0.255) \\ 
  & & \\ 
 history &  & 0.001$^{***}$ \\ 
  &  & (0.0003) \\ 
  & & \\ 
 Constant & 1.167$^{***}$ & 0.482 \\ 
  & (0.242) & (0.306) \\ 
  & & \\ 
\hline \\[-1.8ex] 
Observations & 42,613 & 42,613 \\ 
R$^{2}$ & 0.001 & 0.001 \\ 
Adjusted R$^{2}$ & 0.001 & 0.001 \\ 
Residual Std. Error & 14.990 (df = 42608) & 14.988 (df = 42607) \\ 
F Statistic & 10.345$^{***}$ (df = 4; 42608) & 10.930$^{***}$ (df = 5; 42607) \\ 
\hline 
\hline \\[-1.8ex] 
\textit{Note:}  & \multicolumn{2}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
\end{tabular} 
\end{table}

Terminalでファイルが生成されていることを確認

Terminal
ocean@AfoGuardMacBook-Pro Desktop % ls | grep tex
mactex.txt
重回帰分析.tex
ocean@AfoGuardMacBook-Pro Desktop % ls | grep html
重回帰分析.html
ocean@AfoGuardMacBook-Pro Desktop % 
3
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?