More than 3 years have passed since last update.

回帰不連続デザイン（Regression Discontinuity Design: RDD）を試してみた

Last updated at 2021-05-03Posted at 2021-05-03

はじめに

とあるセミナーで，単一事例研究の解析で回帰不連続デザインを用いていたのを見かけました。
面白そうだったので遊んでみた，という経緯でこの記事を書いています。
筆者は臨床心理学の研究者です。

回帰不連続デザイン

単一事例研究において，ABデザインなど，時系列の中でA（ベースライン）からB（介入）に移行する明らかなイベントが存在します。
そのイベント前後における回帰式の連続性が疑わしいとき，A期からB期への移行はアウトカムの変数に対して有効かもしれないと主張できます。

…というのが一応筆者のざっくりとした理解です。勉強してきます。

wikipediaでも取り上げられているので，引用しておきます。

実践

遊んでみた記録を記します。
以下の記事を参考にしました。

ここにもあるように，{rddtools}パッケージというのがあるみたいですが，今回は使わずやることにしました。
また，以下4パターンあるみたいですが，今回は"same slopes"でいくことにしました。

same slopes
different slopes
functional form
sensitivity analysis

使用したパッケージ

今回の遊びで初めて知ったけど，{magrittr}なんて便利なもんあったんですね。

library("ggplot2")
library("dplyr")
library("magrittr")

データ生成

適当に乱数生成しました。
なにかの望ましい行動の生起頻度くらいのイメージで作ってます。

Mikeさんが介入で大きな変化があったパターン，Janeさんは介入で変化が見られなかったパターンをイメージしています。
Mikeさんは6セッション目以降，値が一気に高くなっていますが，Janeさんはそうなりません。

ts <- 1:20                     # timestamps
bl <- rnorm(5, mean=3, sd=3)   # baseline
iv <- rnorm(15, mean=20, sd=5) # intervention
ct <- rnorm(15, mean=5, sd=5)  # control

Mike <- c(bl,iv)
Jane <- c(bl,ct)

df <- data.frame(ts, Mike, Jane)

こんなデータができます。
（すみません，縦軸のスケールがあっていないので見づらいです。）

回帰分析

上述の通り，"same slopes"モデルを実施しました。

Mikeさんの場合

m1_mike <- df %>%
	mutate(phase=ifelse(ts >= th, 1, 0)) %$%
	lm(Mike ~ phase + I(ts - th))

summary(m1_mike)

結果は以下の通りです。

Call:
lm(formula = Mike ~ phase + I(ts - th))

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2705 -2.6743 -0.4577  1.8675 11.1294 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.8145     2.1420   1.314 0.206313    
phase        14.0681     3.4848   4.037 0.000856 ***
I(ts - th)    0.2905     0.2617   1.110 0.282381    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.456 on 17 degrees of freedom
Multiple R-squared:  0.7659,	Adjusted R-squared:  0.7384 
F-statistic: 27.82 on 2 and 17 DF,  p-value: 4.357e-06

phaseのところで傾きの推定値（Estimate）が有意に高い（p-value = 0.000856 < .05）ので，不連続と見ましょう，ということなんでしょうか。

Janeさんの場合

m1_jane <- df %>%
	mutate(phase=ifelse(ts >= th, 1, 0)) %$%
	lm(Jane ~ phase + I(ts - th))

summary(m1_jane)

結果は以下の通りです。

Call:
lm(formula = Jane ~ phase + I(ts - th))

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7856 -2.4332 -0.0567  2.2956  8.1257 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.3596     1.8646   0.729    0.476
phase         2.9708     3.0335   0.979    0.341
I(ts - th)   -0.1945     0.2278  -0.854    0.405

Residual standard error: 3.879 on 17 degrees of freedom
Multiple R-squared:  0.05509,	Adjusted R-squared:  -0.05607 
F-statistic: 0.4956 on 2 and 17 DF,  p-value: 0.6177

こちらはphaseの傾きの推定値（Estimate）が有意に高いことは示されなかったです。

おわりに

とりあえず遊んでみただけなので理解が浅いのですが，何度もお会いして支援する立場として，この評価方法を知って，非常に有力な手札を得た感じがします。

（参考）コード全文

set.seed(1234)
# --------------
# call packages
# --------------
library("ggplot2")
library("dplyr")
library("magrittr")

# ---------------
# set variables
# ---------------
ts <- 1:20                     # timestamps
bl <- rnorm(5, mean=3, sd=3)   # baseline
iv <- rnorm(15, mean=20, sd=5) # intervention
ct <- rnorm(15, mean=5, sd=5)  # control

Mike <- c(bl,iv)
Jane <- c(bl,ct)

df <- data.frame(ts, Mike, Jane)

png("Mike.png")
g <- ggplot(df, aes(x=ts, y=Mike))
g <- g + geom_line()
g <- g + geom_vline(xintercept=5.5, linetype="dashed")
plot(g)
dev.off()

png("Jane.png")
g <- ggplot(df, aes(x=ts, y=Jane))
g <- g + geom_line()
g <- g + geom_vline(xintercept=5.5, linetype="dashed")
plot(g)
dev.off()

# ----------------
# regression (lm)
# ----------------
th <- 6
# model1 : same slopes
m1_mike <- df %>%
	mutate(phase=ifelse(ts >= th, 1, 0)) %$%
	lm(Mike ~ phase + I(ts - th))

summary(m1_mike)

m1_jane <- df %>%
	mutate(phase=ifelse(ts >= th, 1, 0)) %$%
	lm(Jane ~ phase + I(ts - th))

summary(m1_jane)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up