LoginSignup
0
0

More than 5 years have passed since last update.

mikorovのword2vec実装のLRの更新式メモ

Last updated at Posted at 2017-08-01

LRによるcontext wordの予測モデル

context word

P(y=1| v_{context}\cdot v_{target}) = \frac{1}{1 + \exp(- v_{context}\cdot v_{target}) }

negative sampling word

P(y=-1 | v_{context}\cdot v_{target}) = 1 - P(y=1 | v_{context}\cdot v_{target})

y=1 => context wordの尤度最大化によるベクトル更新式

\begin{eqnarray}
L &=& \log(P(y=1)  \\
\frac{\partial L}{\partial v_{c}} &=& 
- \partial v_c\log\left((1 + \exp( - v_c \cdot v_t)\right) \\ 
&=& - \frac{1}{1 + \exp(-v_c \cdot v_t) } e^{-v_c \cdot v_t } \times - v_t \\
&=& \frac{\exp(-v_c \cdot v_t) }{1 + \exp(-v_c \cdot v_t) } v_t \\
&=& (1 - P(y=1)) v_t \\
\frac{\partial L}{\partial v_{t}} &=& (1-P(y=1) v_c \\
\end{eqnarray}

y=-1 => negative sampling wordの尤度最大化によるベクトル更新式

\begin{eqnarray}
L &=& \log(P(y=-1) = \log(1 - P(y=1)) \\
\frac{\partial L}{\partial v_{c}} &=& 
\partial v_c\log(1- P(y=1)) \\ 
&=&
\frac{1}{1 - P(y=1)} -1 \times \partial v_c P(y=1) \\
&=& 
 \frac{1 + \exp(-v_c \cdot v_t)}{\exp(-v_c \cdot v_t)}
 -1 * -1 * (1 + \exp(-v_c \cdot v_t))^{-2} \exp(-v_c \cdot v_t) -1 v_t \\
&=& 
 \frac{-1}{1 + \exp(-v_c \cdot v_t)} v_t \\
&=&
 -P(y=1) v_t
\end{eqnarray}

まとめると

  • label = 1 <-> context word
  • label = 0 <-> negative sampling word
\begin{eqnarray}
g = \frac{\partial L}{\partial v_{c}} &=& 
(\mbox{label} - P(y=1)) v_t \\
g = \frac{\partial L}{\partial v_{t}} &=& 
(\mbox{label} - P(y=1)) v_c \\
\end{eqnarray}
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0