5
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

オッズ比の分散を計算する

Last updated at Posted at 2018-02-01

統計初心者のメモ
だいたいどのページにもオッズ比$\frac{ad}{bc}$の信頼区間の計算で、「対数オッズ比は正規近似できて、その分散は$\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}$で計算できる」としか書いていない。
どうしても納得いかなかったので、色々調べて解決した、、、つもり。

二項分布を仮定した方法

互いに独立な確率変数$X\sim Bin(n_1,p_1)$と、確率変数$Y\sim Bin(n_2,p_2)$を考える。

$X=x$と$Y=y$から作られる2x2表は、

Group1 Group2
$n_{11}=x$ $n_{12}=n_1-x$ $n_1$
$n_{21}=y$ $n_{22}=n_2-y$ $n_2$
と書ける。

$n_1,n_2$が十分大きい場合、

$Bin(n_1,p_1)\to N(p_1,\frac{p_1(1-p_1)}{n_1})$

$Bin(n_2,p_2)\to N(p_2,\frac{p_2(1-p_2)}{n_2})$

で近似される。

デルタ法

母数のパラメータ$\theta$について、

$\displaystyle \frac{\hat\theta-\theta}{\widehat{SE}_{\hat\theta}}\to N(0,1)$

$\displaystyle \frac{f(\hat\theta)-f(\theta)}{f^{\prime}(\theta)\widehat{SE}_{\hat\theta}}\to N(0,1)$

が近似される。

オッズ比の信頼区間の導出

  • 解法1
    パラメータ$\theta=p_1$、$\hat\theta=\hat{p_1}$とすると、
    $\displaystyle \widehat{SE}_{\hat{p_1}}=\sqrt\frac{\hat{p_1}(1-\hat{p_1)}}{n_1}$
    ここで、$f(x)=\log(x)$とすると$f^{\prime}(x)=\frac{1}{x}$
    よって、
\begin{aligned}
\widehat{SE}\_{\log(\hat{p_1})}&= f^{\prime}(\hat\theta)\widehat{SE}\_{\hat\theta} \\
&= \frac{1}{\hat{p_1}}\sqrt\frac{\hat{p_1}(1-\hat{p_1})}{n_1}\\
&=\sqrt\frac{(1-\hat{p_1})}{\hat{p_1}n_1} \\
\end{aligned}
\begin{aligned}
Var(\log(\hat{p_1}))=\frac{1-\hat{p_1}}{\hat{p_1}n_1}
\end{aligned}

$\mathrm{Odds\ ratio}=OR$は、
$\widehat{OR}=\displaystyle \frac{\hat{p_1}(1-\hat{p_2})}{(1-\hat{p_1})\hat{p_2}}$

対数をとって、

\begin{aligned}
\log(\widehat{OR})&=\log\Bigl( \displaystyle \frac{\hat{p_1}(1-\hat{p_2})}{(1-\hat{p_1})\hat{p_2}}\Bigl) \\
&=\log\left(\frac{\hat{p_1}}{1-\hat{p_1}}\right)-\log\left(\frac{\hat{p_2}}{1-\hat{p_2}}\right) 
\end{aligned}

よって、

\begin{aligned}
Var(\log(\widehat{OR}))&=Var(\log(\hat{p_1})-\log(1-\hat{p_1}))-(\log(\hat{p_2})-\log(1-\hat{p_2})) \\
& \ \ \ \ \text{(独立を仮定しているので)} \\
&=Var(\log(\hat{p_1}))+Var(\log(1-\hat{p_1}))+Var(\log(\hat{p_2}))+Var(\log(1-\hat{p_2})) \\
& \approx \frac{1-\hat{p_1}}{\hat{p_1}n_1}+\frac{\hat{p_1}}{(1-\hat{p_1})n_1}+\frac{1-\hat{p_2}}{\hat{p_2}n_2}+\frac{\hat{p_2}}{(1-\hat{p_2})n_2} \\
&=\frac{1}{\hat{p_1}n_1}+\frac{1}{n_1}+\frac{1}{\hat{p_2}n_2}+\frac{1}{n_2}+\frac{\hat{p_2}}{(1-\hat{p_2})n_2}+\frac{\hat{p_1}}{(1-\hat{p_1})n_1} \\
&=\frac{1}{\hat{p_1}n_1}+\frac{1}{\hat{p_2}n_2}+\frac{1}{(1-\hat{p_1})n_1}+\frac{1}{(1-\hat{p_2})n_2} \\
&=\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}}
\end{aligned}
\begin{aligned}
\widehat{SE}_{\widehat{OR}}=\sqrt{\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}}}
\end{aligned}

  • 解法2

$\displaystyle \hat{\omega_1}=\frac{\hat{p_1}}{1-\hat{p_1}}, \hat{\omega_2}=\frac{\hat{p_2}}{1-\hat{p_2}}$とすると、

\begin{aligned}
\frac{\partial}{\partial p}\log(\hat{\omega_1})&=\frac{\partial}{\partial\omega}\log(\hat{\omega_1})\cdot\frac{\partial\omega}{\partial p}\\
&=\frac{1}{\hat{\omega_1}}\cdot\frac{\partial}{\partial p}\frac{\hat{p_1}}{1-\hat{p_1}}\\
&=\frac{1-\hat{p_1}}{\hat{p_1}}\cdot\frac{1}{(1-\hat{p_1})^2}\\
&=\frac{1}{\hat{p_1}(1-\hat{p_1})}
\end{aligned}

デルタ法より、

\begin{aligned}
Var(\log(\hat{\omega_1}))&\approx \left\{f'(\log(\hat{\omega_1}))\right\}^2\cdot Var(\hat{p_1})\\
&=\left(\frac{1}{\hat{p_1}(1-\hat{p_1})})\right)^2\cdot \frac{\hat{p_1}(1-\hat{p_1})}{n_1}\\
&=\frac{1}{n_1\hat{p_1}(1-\hat{p_1})}
\end{aligned}

$\hat{\omega_2}$についても同様に、

Var(\log(\hat{\omega_2}))=\frac{1}{n_2\hat{p_2}(1-\hat{p_2})}

$\widehat{OR}=\displaystyle \frac{\hat{\omega_1}}{\hat{\omega_2}}$なので、

\begin{aligned}
\log(\widehat{OR})&=\log\left( \displaystyle \frac{\hat{\omega_1}}{\hat{\omega_2}}\right)  \\
Var(\log(\widehat{OR}))&=Var\left(\log\left( \displaystyle \frac{\hat{\omega_1}}{\hat{\omega_2}}\right)\right)\\
&=Var(\log(\hat{\omega_1}))+Var(\log(\hat{\omega_2}))\\
&=\frac{1}{n_1\hat{p_1}(1-\hat{p_1})}+\frac{1}{n_2\hat{p_2}(1-\hat{p_2})}\\
&=\frac{1}{n_1\hat{p_1}}+\frac{1}{n_1(1-\hat{p_1})}+\frac{1}{n_2\hat{p_2}}+\frac{1}{n_2(1-\hat{p_2})} \\
&=\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}}
\end{aligned}

多項分布を仮定した方法

確率変数$\mathbb{X}=(X_1,...,X_k)^\mathrm{T}$はサンプル数$n$と確率$p_i($全ての$p_i$について$p_1\geq0$となり、かつ$\sum_{i=1}^k{p_i}=1$が成り立つ)の多項分布に従うとする。
この時、
$E[X_i]=np_i, \ V[X_i]=np_i(1-p_i), \ Cov(X_i,X_j)=-np_ip_j(i\neq j)$
$Y_i=\displaystyle \frac{(X_i-np_i)}{\sqrt{np_i}}$とすると、
$E[Y_i]=0, \ V[Y_i]=1-p_i, \ Cov(Y_i,Y_j)=-\sqrt{p_ip_j}(i\neq j)$

ここで、$k=4$の場合を考える。
確率変数$X_1,X_2,X_3,X_4$が確率$p=(p_{11},\ p_{12},\ p_{21},\ p_{22})$の多項分布に従うとすると、
観測値$n=(n_{11},\ n_{12},\ n_{21},\ n_{22})$

2x2表は、

Group1 Group2
$n_{11}(n*p_{11})$ $n_{12}(n*p_{12})$
$n_{21}(n*p_{21})$ $n_{22}(n*p_{22})$

と書ける。

$OR=\displaystyle \frac{p_{11}p_{22}}{p_{12}p_{21}}$

オッズ比の最尤推定量$\widehat{OR}=\displaystyle \frac{n_{11}n_{22}}{n_{12}n_{21}}$

表記を単純化するため、$(O_{11},\ O_{12},\ O_{21},\ O_{22})=(O_1,\ O_2,\ O_3,\ O_4)$とすると、

$OR=\displaystyle \frac{p_1p_4}{p_2p_3},\ \widehat{OR}=\frac{n_1n_4}{n_2n_3}$
となる。

$Z_i=\displaystyle \frac{(n_i-np_i)}{\sqrt{n}} \Leftrightarrow Z_i=\sqrt{p_i}Y_i \ \ (i=1,2,3,4)$
$n$が大きい時、$Z_i \sim N(0,p_i(1-p_i))$

$n_i=np_i+\sqrt{n}Z_i \Rightarrow \displaystyle \frac{n_i}{n}=p_i\left(1+\frac{Z_i}{p_i\sqrt{n}}\right)$

ここで、$x \to 0$の場合、テイラー展開より$\log(1+x) \sim x$と近似できるため、

\begin{aligned}
\displaystyle \log(\frac{n_i}{n})&=\log(p_i)+\log\left(1+\frac{Z_i}{p_i\sqrt{n}}\right) \\
&=\log(p_i)+\frac{Z_i}{p_i\sqrt{n}}+\varepsilon_i\ \ (\varepsilon_i=O_p(1/n),\  n \to \infty) \\ 
\end{aligned}

$\widehat{OR}$の$n_i$を$\frac{n_i}{n}$で書き換えると、

\begin{aligned}
\displaystyle \log(\widehat{OR})&=\log\left(\frac{\frac{n_1}{n}\frac{n_4}{n}}{\frac{n_2}{n}\frac{n_3}{n}}\right) \\
&=\log(p_1)+\log(p_4)-\log(p_2)-\log(p_3) \\ &\ \ \ +\frac{Z_1}{p_1\sqrt{n}}+\frac{Z_4}{p_4\sqrt{n}}-\frac{Z_2}{p_2\sqrt{n}}-\frac{Z_3}{p_3\sqrt{n}}+\varepsilon \\
&=\log(OR)+\frac{1}{\sqrt{n}}\left(\frac{Z_1}{p_1}+\frac{Z_4}{p_4}-\frac{Z_2}{p_2}-\frac{Z_3}{p_3}\right)+\varepsilon\ \ (\varepsilon=O_p(1/n) )\\
\end{aligned}

ここで、デルタ法より$\log(\widehat{OR})$は平均$\log(OR)$の正規分布に近似できる。

$\log(OR)$を左辺に移項すると、

\begin{aligned}
\displaystyle \log(\widehat{OR})-\log(OR)&=\frac{1}{\sqrt{n}}\left(\frac{Z_1}{p_1}+\frac{Z_4}{p_4}-\frac{Z_2}{p_2}-\frac{Z_3}{p_3}\right)+\varepsilon \\ 
&\approx N\left(0,\left(\frac{d}{dOR}\log(OR)\right)^2V\left(\log(\widehat{OR})\right)\right)
\end{aligned}

帰無仮説の仮定の元で、$OR=1 \Rightarrow \frac{d}{dOR}\log(OR))=1$となり、

\begin{aligned}
\displaystyle \log(\widehat{OR})-\log(OR)\approx N\left(0,V\left(\log(\widehat{OR})\right)\right)
\end{aligned}

また、$n$が大きい時$\varepsilon \to 0$となるため、

\begin{aligned}
\displaystyle \log(\widehat{OR})-\log(OR)&=\frac{1}{\sqrt{n}}\left(\frac{Z_1}{p_1}+\frac{Z_4}{p_4}-\frac{Z_2}{p_2}-\frac{Z_3}{p_3}\right)
\end{aligned}

ここで、両辺の分散を考えると、

\begin{aligned}
\displaystyle V\left(\log(\widehat{OR})-\log(OR)\right)&=V\left(\log(\widehat{OR})\right) \\
&=V\left(\frac{1}{\sqrt{n}}\left(\frac{Z_1}{p_1}+\frac{Z_4}{p_4}-\frac{Z_2}{p_2}-\frac{Z_3}{p_3}\right)\right) \\
&=\sum_{i=1}^4{V\left(\frac{Y_i}{\sqrt{np_i}}\right)}+\frac{1}{n}Cov \\
&=\sum_{i=1}^4{\frac{1-p_i}{np_i}}+\frac{1}{n}Cov\ \ (\because V(Y_i)=1-p_i) \\
&=\frac{1}{n}\left(-4+\sum_{i=1}^4{\frac{1}{p_i}}\right)+\frac{1}{n}Cov
\end{aligned}

ここで、$Cov(Z_i,Z_j)=-p_ip_j \Rightarrow Cov(\frac{Z_i}{p_i},\frac{Z_j}{p_j})=-1\ \ (i \neq j)$
4つの変数の共分散の組み合わせは$_4C_2=6$、そのうち正の値をとる組み合わせは$(1,4),(2,3)$のみ。よって、$Cov=2(2-4)(-1)=4$となり、第1項と第2項の$4/n$が相殺されるため、

\begin{aligned}
\displaystyle V(\log(\widehat{OR}))&=\frac{1}{n}\left(\sum_{i=1}^4{\frac{1}{p_i}}\right) \\
&=\sum_{i=1}^4{\frac{1}{n_i}}
\end{aligned}
5
3
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?