More than 1 year has passed since last update.

『効果検証入門』第3章_傾向スコア算出に入れるべき変数／入れるべきでない変数

#因果推論

Last updated at 2023-09-27Posted at 2023-09-27

1.傾向スコア算出のパターンを変える

①RCTの結果

# A tibble: 2 × 4
  treatment conversion_rate spend_mean count
      <dbl>           <dbl>      <dbl> <int>
1         0         0.00573      0.653 21306
2         1         0.0125       1.42  21307

ATE=ATT=0.747

②交絡変数だけを加える

 ## 傾向スコアを利用したマッチング
 m_near <- matchit(formula = treatment ~ recency + history + channel,
                   data = biased_data,
                   method = "nearest",
                   replace = TRUE)
 ## マッチング後のデータを作成
 matched_data <- match.data(m_near)
 ## マッチング後のデータで効果の推定
 PSM_result <- lm(data = matched_data,
                  formula = spend ~ treatment) %>%
   tidy()
 PSM_result

# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    0.665     0.187      3.55 0.000387
2 treatment      0.863     0.226      3.81 0.000137

ATT=0.863

③媒介変数を加える

#実験:媒介変数を加える
m_near_withM <- matchit(formula = treatment ~ recency + history + channel+visit+conversion,
                  data = biased_data,
                  method = "nearest",
                  replace = TRUE)
## マッチング後のデータを作成
matched_data_withM <- match.data(m_near_withM)
## マッチング後のデータで効果の推定
PSM_result_withM <- lm(data = matched_data_withM,
                 formula = spend ~ treatment) %>%
  tidy()
PSM_result_withM

# A tibble: 2 × 5
  term        estimate std.error statistic    p.value
  <chr>          <dbl>     <dbl>     <dbl>      <dbl>
1 (Intercept)    0.946     0.194      4.88 0.00000105
2 treatment      0.581     0.234      2.48 0.0131

ATT=0.581

④交絡変数の1つを落とす

#実験:交絡変数historryを落とす
m_near_ommit <- matchit(formula = treatment ~ recency + channel ,
                        data = biased_data,
                        method = "nearest",
                        replace = TRUE)
## マッチング後のデータを作成
matched_data_ommit <- match.data(m_near_ommit)
## マッチング後のデータで効果の推定
PSM_result_ommit <- lm(data = matched_data_ommit,
                       formula = spend ~ treatment) %>%
  tidy()
PSM_result_ommit

# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)    0.599     0.196      3.06 0.00224  
2 treatment      0.929     0.233      3.98 0.0000700

ATT=0.929

結果まとめ

RCT	全交絡変数	媒介変数入り	交絡変数欠損
0.747	0.863	0.581	0.929

考察

・一番RCTに推定が近いのは全交絡変数を入れたとき（②）
・バイアスのかかり方が重回帰分析で言われるバイアスのかかり方と同じ
・傾向スコアのときも因果ダイアグラム、コントロール変数の選択法を意識した方がよさそう

2.重みづけのパターンを変える

①RCT

# A tibble: 2 × 4
  treatment conversion_rate spend_mean count
      <dbl>           <dbl>      <dbl> <int>
1         0         0.00573      0.653 21306
2         1         0.0125       1.42  21307

ATE=0.747

②全ての交絡変数

## 重みの推定
weighting <- weightit(treatment ~ recency + history + channel,
              data = biased_data,
              method = "ps",
              estimand = "ATE")

## 重み付きデータでの効果の推定
IPW_result <- lm(data = biased_data,
                 formula = spend ~ treatment,
                 weights = weighting$weights) %>%
  tidy()
IPW_result

# A tibble: 2 × 5
  term        estimate std.error statistic     p.value
  <chr>          <dbl>     <dbl>     <dbl>       <dbl>
1 (Intercept)    0.580     0.116      4.99 0.000000601
2 treatment      0.870     0.165      5.27 0.000000136

ATE=0.870

③媒介変数を入れる

#実験：媒介変数を入れる
weighting_withM <- weightit(treatment ~ recency + history + channel+visit+conversion,
                      data = biased_data,
                      method = "ps",
                      estimand = "ATE")
IPW_result_withM <- lm(data = biased_data,
                 formula = spend ~ treatment,
                 weights = weighting_withM$weights) %>%
  tidy()
IPW_result_withM

# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   1.01       0.115     8.79  1.59e-18
2 treatment     0.0846     0.163     0.519 6.04e- 1

ATE=0.0846

④交絡変数を1つ落とす

#実験：交絡変数historyを落とす
weighting_ommit <- weightit(treatment ~ recency +  channel,
                            data = biased_data,
                            method = "ps",
                            estimand = "ATE")
IPW_result_ommit <- lm(data = biased_data,
                       formula = spend ~ treatment,
                       weights = weighting_ommit$weights) %>%
  tidy()
IPW_result_ommit

# A tibble: 2 × 5
  term        estimate std.error statistic      p.value
  <chr>          <dbl>     <dbl>     <dbl>        <dbl>
1 (Intercept)    0.575     0.117      4.91 0.000000897 
2 treatment      0.889     0.166      5.37 0.0000000815

結果まとめ

RCT	全交絡変数	媒介変数入り	交絡変数欠損
0.747	0.870	0.0846	0.889

考察

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up