Deep Learning Specialization の RNN, Week 2 (RNN W2) の内容です。
(RNN W2L01) Word Representation
Word representation
- 1-hot representation
- $V = [a, aaron, ... , zulu, <\textrm{UNK}>] $ ($|V| = 10000$)
- 例えば man は 10000 要素のベクトルの 5391 番目が 1, それ以外がゼロ ; $O_{5381}$
Featurized representation : word embedding
Man |
Woman |
King |
Queen |
Apple |
Orange |
... |
Gender |
-1 |
1 |
-0.95 |
0.97 |
0.00 |
0.00 |
Royal |
0.01 |
0.02 |
0.93 |
0.95 |
-0.01 |
0.00 |
Age |
0.03 |
0.02 |
0.7 |
0.69 |
0.03 |
-0.02 |
Food |
0.09 |
0.01 |
0.02 |
0.01 |
0.95 |
0.97 |
Size |
Cost |
Verb |
... |
- この縦のベクトルを Man なら $e_{5381}$ と表現する
Visualization word emebeddings
(RNN W2L02) Using Word Embeddings
- Named entity recognition example
- Transfer learning and word embedding
- Learn word embeddings from large text corpus (1B - 100B words) (Or download pre-trained embedding online)
- Transfer embedding to new task with smaller training set (Say, 100k words)
- Optional : Continue to finetune the word embeddings with the new data
- Face encoding と考え方が似ている
(RNN W2L03) Properties of word embedding
- Man → Woman as King → ?
- $e_{man} - e_{woman} \sim e_{king} -e_{?}$ となる ? を探す
Analogies using word vectors
\textrm{arg} \max_w \textrm{sim} \left( e_w, e_{king} - e_{man} + e_{women} \right)
Cosine similarity
\textrm{sim}\left( u, v \right) = \frac{u^T v}{\|u\|_2 \|v\|_2}
- ユークリッド距離 ($||u - v||^2$) でも良い (非線形になるけど)
(RNN W2L04) Embedding matrix
- $E o_j = e_j$ ; embedding for word $j$
- In practice, use special function to look up an embeddings. いちいち行列とベクトルの掛け算をして $e_j$ を求めるのは計算コストが高いので...
(RNN W2L05)
Neural language model
- 例文 ; I want a glass of orange ____.
- 直前 4 words だけ使う (4 ; hyperparameter) と,入力は 300 * 4 = 1200 次元。ここから Softmax で単語を選ぶ
Other context/target unit
- 例文 ; I want a glass of orange juice to go along with my cereal.
- Context の選び方
- Last 4 words (前述)
- 4 words on left & right
- last 1 word
- nearby 1 word
(RNN W2L06)
- 例文 ; I want a glass of orange juice to go along with my cereal.
Context |
Target |
orange |
juice |
orange |
glass |
orange |
my |
- Vocab size = 10k
- context $c$ ("orange") → target $t$ ("juice")
- $o_c$ → $E$ → $e_c$ → 〇(softmax) → $\hat{y}$
p\left( t | c \right) = \frac{e^{\theta_t^T e_c}}{\sum_{j=1}^{10000} e^{\theta_j^T e_c}}
- $\theta_t$ ; parameter associate with output $t$
L\left( \hat{y}, y \right) = -\sum_{i=1}^{10000} y_i \log\hat{y}_i
Problems with softmax classification
- $e_c$ の探索に時間がかかる → Hierarchical softmax
(RNN W2L07) Negative sampling
Defining a new language problem
- 例文 ; I want a glass of orange juice to go along with my cereal.
context |
word |
target |
orange |
juice |
- (positive) |
orange |
king |
o (negative) |
orange |
book |
o |
orange |
the |
o |
- ランダムに word を選ぶ
- 単語数 $k$ は
- $k = 5 \sim 20$ ; smaller data set
- $k = 2 \sim 5$ ; larger data set
p\left( y=1 | c, t \right) = \sigma\left(\theta_t^T e_c \right)
- 全部の word でトレーニングするのではなく,ランダムに選ばれた $k$ 個の word でトレーニングする
Selecting negative example
p(w_i) = \frac{f(w_i)^\frac{3}{4}}{\sum_{j=1}^{10000} f(w_j)^\frac{3}{4}}
(RNN W2L08) GloVe word vectors
- $X_{ij}$ ; #times $i$ appears in context of $j$
\textrm{minimize} \sum_{i=1}^{10000} \sum_{j=1}^{10000} f(X_{ij}) \left( \theta_i^T e_j + b_i - b_j^\prime - \log X_{ij} \right)^2 \\
f\left( X_{ij} \right) = 0 \ \ \textrm{if} \ X_{ij} = 0 \\
e_w^{(final)} = \frac{e_w + \theta_w}{2}
- $\theta_i$, $e_j$ are symmetric
(RNN W2L09) Sentiment classification
Sentiment classification
Sentence |
# stars |
The dessert is excelent |
$\star \star \star \star $ |
Service was quite slow |
$\star \star $ |
Good for a quick meal, but nothing special |
$\star \star \star$ |
- 単純に各単語の $e_w$ の平均を取って softmax してはダメ (言葉の順番も重要)
RNN for sentiment classification
(RNN W2L10) Debiasing word embeddings
The problem of bias in word embeddings
judge |
Man : Woman as King : Queen |
no problem |
Man : Computer_Programmer as Woman : Homemaker |
problem |
Father : Doctor as Mother : Nurse |
problem |
- Word embeddings can reflect gender, ethnicity, age, sexual orientation, and other bias of text used to train the model.
Addressing bias in word embedding
- Identify bias direction
- $e_{he} - e_{she}$
- $e_{male} -e_{female}$
- Neutralize : For every word that is not definitional, project to get rid of bias
- Equalize pairs
- grandfather, grandmother
- boy, girl