はじめに
Deep Learning Specialization の RNN, Week 2 (RNN W2) の内容です。
(RNN W2L01) Word Representation
内容
Word representation
- 1-hot representation
- $V = [a, aaron, ... , zulu, <\textrm{UNK}>] $ ($|V| = 10000$)
- 例えば man は 10000 要素のベクトルの 5391 番目が 1, それ以外がゼロ ; $O_{5381}$
Featurized representation : word embedding
|  | Man | Woman | King | Queen | Apple | Orange | ... | 
| Gender | -1 | 1 | -0.95 | 0.97 | 0.00 | 0.00 |  | 
| Royal | 0.01 | 0.02 | 0.93 | 0.95 | -0.01 | 0.00 |  | 
| Age | 0.03 | 0.02 | 0.7 | 0.69 | 0.03 | -0.02 |  | 
| Food | 0.09 | 0.01 | 0.02 | 0.01 | 0.95 | 0.97 |  | 
| Size |  |  |  |  |  |  |  | 
| Cost |  |  |  |  |  |  |  | 
| Verb |  |  |  |  |  |  |  | 
| ... |  |  |  |  |  |  |  | 
- この縦のベクトルを Man なら $e_{5381}$ と表現する
Visualization word emebeddings
(RNN W2L02) Using Word Embeddings
内容
- Named entity recognition example
- Transfer learning and  word embedding
- Learn word embeddings from large text corpus (1B - 100B words) (Or download pre-trained embedding online)
- Transfer embedding to new task with smaller training set (Say, 100k words)
- Optional : Continue to finetune the word embeddings with the new data
 
- Face encoding と考え方が似ている
(RNN W2L03) Properties of word embedding
内容
Analogies
- Man → Woman as King → ?
- $e_{man} - e_{woman} \sim e_{king} -e_{?}$ となる ? を探す
Analogies using word vectors
\textrm{arg} \max_w \textrm{sim} \left( e_w, e_{king} - e_{man} + e_{women} \right)
Cosine similarity
\textrm{sim}\left( u, v \right) = \frac{u^T v}{\|u\|_2 \|v\|_2}
- ユークリッド距離 ($||u - v||^2$) でも良い (非線形になるけど)
(RNN W2L04) Embedding matrix
内容
- $E o_j = e_j$ ; embedding for word $j$
- In practice, use special function to look up an embeddings. いちいち行列とベクトルの掛け算をして $e_j$ を求めるのは計算コストが高いので...
(RNN W2L05)
内容
Neural language model
- 例文 ; I want a glass of orange ____.
- 直前 4 words だけ使う (4 ; hyperparameter) と,入力は 300 * 4 = 1200 次元。ここから Softmax で単語を選ぶ
Other context/target unit
- 例文 ; I want a glass of orange juice to go along with my cereal.
- Context の選び方
- Last 4 words (前述)
- 4 words on left & right
- last 1 word
- nearby 1 word
 
(RNN W2L06)
内容
Skip-grams
- 例文 ; I want a glass of orange juice to go along with my cereal.
| Context | Target | 
| orange | juice | 
| orange | glass | 
| orange | my | 
Model
- Vocab size = 10k
- context $c$ ("orange") → target $t$ ("juice")
- $o_c$ → $E$ → $e_c$ → 〇(softmax) → $\hat{y}$
p\left( t | c \right) = \frac{e^{\theta_t^T e_c}}{\sum_{j=1}^{10000} e^{\theta_j^T e_c}}
- $\theta_t$ ; parameter associate with output $t$
L\left( \hat{y}, y \right) = -\sum_{i=1}^{10000} y_i \log\hat{y}_i
Problems with softmax classification
- $e_c$ の探索に時間がかかる → Hierarchical softmax
(RNN W2L07) Negative sampling
内容
Defining a new language problem
- 例文 ; I want a glass of orange juice to go along with my cereal.
| context | word | target | 
| orange | juice | - (positive) | 
| orange | king | o (negative) | 
| orange | book | o | 
| orange | the | o | 
- ランダムに word を選ぶ
- 単語数 $k$ は
- $k = 5 \sim 20$ ; smaller data set
- $k = 2 \sim 5$ ; larger data set
 
Model
p\left( y=1 | c, t \right) = \sigma\left(\theta_t^T e_c \right)
- 全部の word でトレーニングするのではなく,ランダムに選ばれた $k$ 個の word でトレーニングする
Selecting negative example
p(w_i) = \frac{f(w_i)^\frac{3}{4}}{\sum_{j=1}^{10000} f(w_j)^\frac{3}{4}}
感想
(RNN W2L08) GloVe word vectors
内容
- $X_{ij}$ ; #times $i$ appears in context of $j$
Model
\textrm{minimize} \sum_{i=1}^{10000} \sum_{j=1}^{10000} f(X_{ij}) \left( \theta_i^T e_j + b_i - b_j^\prime - \log X_{ij}  \right)^2 \\
f\left( X_{ij} \right) = 0 \ \ \textrm{if} \ X_{ij} = 0 \\
e_w^{(final)} = \frac{e_w + \theta_w}{2} 
- $\theta_i$, $e_j$ are symmetric
(RNN W2L09) Sentiment classification
内容
Sentiment classification
| Sentence | # stars | 
| The dessert is excelent | $\star \star \star \star $ | 
| Service was quite slow | $\star \star $ | 
| Good for a quick meal, but nothing special | $\star \star \star$ | 
- 単純に各単語の $e_w$ の平均を取って softmax してはダメ (言葉の順番も重要)
RNN for sentiment classification
(RNN W2L10) Debiasing word embeddings
内容
The problem of bias in word embeddings
|  | judge | 
| Man : Woman as King : Queen | no problem | 
| Man : Computer_Programmer as Woman : Homemaker | problem | 
| Father : Doctor as Mother : Nurse | problem | 
- Word embeddings can reflect gender, ethnicity, age, sexual orientation, and other bias of text used to train the model.
Addressing bias in word embedding
- Identify bias direction
- $e_{he} - e_{she}$
- $e_{male} -e_{female}$
 
- Neutralize : For every word that is not definitional, project to get rid of bias
- Equalize pairs
- grandfather, grandmother
- boy, girl
 
参考