More than 3 years have passed since last update.

Deep Learning Specialization (Coursera) 自習記録 (RNN W3)

Last updated at 2020-09-06Posted at 2020-08-13

はじめに

Deep Learning Specialization の RNN, Week 3 (RNN W3) の内容です。

(RNN W3L01) Sequence to sequence model : Basic models

内容

RNN の many-to-many のイメージで，前半が encoder network，後半が decoder network
image captioning (画像の説明) にも使える。CNN の softmax を取り外し，RNN を繋ぐ

(RNN) W3L02 Picking the most likely sentence

内容

Machine translation as building a conditional language model

Language model ; ランダムに分を生成する
Machine translation ; 入力文という条件 (condition) が付く
- $P(y^{<1>}, \cdots , y^{<T_y>} | x^{<1>}, \cdots , x^{<T_x>})$

Finding the most likely translation

\textrm{arg} \max_y P(y^{<1>}, \cdots , y^{<T_y>} | x^{<1>}, \cdots , x^{<T_x>})

(RNN) W3L03 Beam search

内容

Beam search algorithm (beam width B=3 の場合)

$P(y^{<1>} | x)$ を最大にする $y^{<1>}$ の候補を 3 つ選ぶ
その 3 つに対して，それぞれ，10000 語の候補を考えて，$P(y^{<1>}, y^{<2>} | x)$ を最大にする候補 3 つ選ぶ
これを繰り返す

(RNN) W3L04 Refinements to beam

内容

Length normalization

\arg\max_y\prod_{t=1}^{T_y} P \left( y^{<t>} | x, y^{<1>}, \cdots , y^{<t-1>}  \right)

上記の場合，1 未満の乗算が続くと，小数点計算がアンダーフローする可能性あり。そこで $\log$ を取って，乗算を加算に変える


\arg\max_y\sum_{t=1}^{T_y} \log P \left( y^{<t>} | x, y^{<1>}, \cdots , y^{<t-1>}  \right)

文が長くなると不利になるので，単語数で正規化する


\frac{1}{T_y^\alpha} \sum_{t=1}^{T_y} \log P \left( y^{<t>} | x, y^{<1>}, \cdots , y^{<t-1>}  \right)

$\alpha = 1$ ; 完全な正規化
$\alpha = 0$ ; 正規化なし
$\alpha = 0.7$ ; 正規化ありと無しの間

Beam search discussion

large B ; better result, slower
small B ; worse result, faster

(RNN W3L05) Error analysis on beam search

内容

Human translation ; $y^\ast$
Algorithm ; $\hat{y}$
Case 1 ; $P(y^\ast | x) > P(\hat{y} | x)$
- Beam search chose $\hat{y}$. But $y^\ast$ attains higher $P(y|x)$.
- Conclusion ; Beam search is at fault
Case 2 ; $P(y^\ast | x) < P(\hat{y} | x)$
- $y^\ast$ is a better translation than $\hat{y}$. But RNN predicted $P(y^\ast | x) < P(\hat{y} | x)$
- Conclusion ; RNN model is at fault
Figures out what faction of errors are due to beam search vs. RNN model.

(RNN W3L06) Bleu Score (Optional)

内容

Bleu score on n-gram


P_n = \frac{ \sum_{n-gram \in \hat{y}} Count_{clip} (n-gram) }{ \sum_{n-gram \in \hat{y}} Count (n-gram) }

Combined Bleu score

BP \cdot \exp\left( \frac{1}{4} \sum_{n=1}^4 P_n \right)

BP
- 1 if MT_output_length > refrence_output_length
- $\exp(1-\textrm{MT_output_length} / \textrm{reference_output_length})$ otherwise

余談

初めて $\in$ (\in) を使った。逆向き $\ni$ は \ni だった。

(RNN W3L07) Attention model intuition

内容

The problem of long sequences

長い文章 (30 words 以上) だと，Bleu score が低下する。人間の翻訳は，全部を見てから翻訳するのではなく，一部に着目して翻訳している。

Attention model intuition

Bi-directional RNN の一部に着目して，単語を output する。

(RNN W3L08) Attention model

内容

生成中に入力文の一部に着目する

$\alpha^{<t, t^\prime>}$ ; amount of attention $y^{<t>}$ should pay to $a^{<t^\prime>}$

a^{<t^\prime>} = \left( \overrightarrow{a}^{<t^\prime>}, \overleftarrow{a}^{<t^\prime>}  \right) \\

\sum_{t^\prime} \alpha^{<1. t^\prime>} = 1 \\

c^1 = \sum_{t^\prime} \alpha^{<1, t^\prime>}a^{<t^\prime>}

Computing attention


\alpha^{<t, t^\prime>} = \frac{\exp{e^{<t, t^\prime>}}}{ \sum_{t^\prime=1}^{T_x} \exp{e^{<t, t^\prime> }}}

$e^{<t, t^\prime>}$ は，$s^{<t-1>}$ と $a^{<t^\prime>}$ を使って，小さい neural network で計算する

Attention example

日付表記の正規化 (July 20th 1969 → 1696-07-20 など)

(RNN W3L09) Speech Recognition

内容

Speech recognition problem

audio clip → transcript

Attention Model for Speech

CTC cost for speech recognition

CTC = connectionist temporal classification
Basic rule : collapse repeated characters not seperated by "blank"
ttt _ h _ eee ___ [space] ___ qqq ___ → the q…

備考

Youtube で動画に字幕が無かった

(RNN W3L10) Trigger Word Detection

内容

Trigger word ; iPhone の "Hey Siri" や Google Home の "OK Google" など
Trigger word が入力されたときに 1，それ以外のときに 0 を出力するような RNN を作る

(RNN W3L11) Summary and Thank you

内容

お疲れさまでした!!

参考

Deep Learning Specialization (Coursera) 自習記録 (目次)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up