More than 3 years have passed since last update.

Deep Learning Specialization (Coursera) 自習記録 (RNN W1)

Last updated at 2020-08-10Posted at 2020-08-10

はじめに

Deep Learning Specialization の RNN, Week 1 (RNN W1) の内容です。

(RNN W1L01) RNN Model - Introduction, Motivation, Why Sequence Model

内容

RNN のモデルの説明
- Outputs from previous step are fed as input to the current step
- Has hidden state which remembers information about a sequence
- Have memory
- $h_t = f(h_{t-1}, X_t)$
Sequential data の例
- 機械翻訳 ; 単語は独立していなくて，前後の単語に依存している
- Name entity recognition
- Sentiment classification
- Word prediction

(RNN W1L02) Sentence / Word Representation

内容

notation
- $X^{(i)}$ ; $i$-th input sentence
- $X^{(i)<t>}$ ; $t$-th word of $i$-th sentence
- $T_X$ ; 語数
- $Y^{(i)}$ ; $i$-th output sentence
Word presentation
- Need to define Dictionary (Vocabulary)
- Represent each word by One-Hot Encoding (denotes position of a word in the Vocabulary)

(RNN W1L03) RNN Model

内容

普通の network を使うときの課題
- Inputs, outputs can be different lengths in different examples.
- Doesn't share feature learned across different position of text
RNN の弱点
- 前の情報は使えるが，後ろの情報を使えない (それを解決するために，Bidirectional RNN (BRNN) がある)

Forword Propagation

a^{<0>} = \vec{0} \\
a^{<t>} = g_1 \left( W_{aa} a^{<t-1>} + W_{ax} X^{<t>} + b_a  \right) \\
\hat{y}^{<t>} = g_2 \left( W_{ya} a^{<t>} + b_y \right)

activation function
- $g_1$ ; $\tanh$ か ReLU
- $g_2$ ; sigmoid

simplified RNN notation


a^{<t>} = g\left( W_a \left[a^{<t-1>} , X^{<t>} \right] + b_a \right) \\
\hat{y}^{<t>} = g\left( W_y a^{<t>} + b_y  \right)

$W_a$ は $W_{aa}$ と $W_{ax}$ を横に並べたもの
$\left[a^{<t-1>} , X^{<t>} \right]$ は，2 つのベクトルを縦に並べたもの

(RNN W1L04) Backpropagation through time

内容

L^{<t>}\left( \hat{y}^{<t>}, y^{<t>} \right) = -y^{<t>} \log \hat{y}^{<t>} - \left( 1-y^{<t>}\right) \log \left( 1-\hat{y}^{<t>} \right) \\
L\left( \hat{y}, y \right) = \sum_{t=1}^{T_y} L^{<t>} \left( \hat{y}^{<t>}, y^{<t>} \right)

(RNN W1L05) Different types of RNNs

内容

Examples of sequence data
- Speech recognition
- Music generation
- Sentiment classification
- DNA sequence analysis
- Machine translation
- Video activity recognition
- Name entity recognition
Example of RNN architecture
- one-to-one
- many-to-one ; sentiment classification
- one-to-many ; music generation
- many-to-many ; machine learning

(RNN W1L06) Language model and sequence generation

内容

What is language modelling?

Speech recognition
- P(sentence) = probability

Language modelling with an RNN

Training set ; large corpus of English text
tokenize
- <EOS> ; end of sentence
- <UNK> ; unknown word
RNN model
- $a^{<0>} = \vec{0}$, $X^{<0>} = \vec{0}$
- $X^{<2>} = y^{<1>}$, $X^{<3>} = y^{<2>}$, $\cdots$ として入力
- $L(\hat{y}^{<t>}, y^{<t>}) = - \sum_i y_i^{<t>} \log\hat{y}_i^{<t>}$
- $L = \sum_t L(\hat{y}^{<t>}, y^{<t>})$

(RNN W1L07) Sampling novel sequence

内容

Character-level language model
- 通常は Vocabulary = [a, aane, ... , zulu, <UNK>]
- Character-level language model は Vocabulary = [a, b, c, ..., X, Y, Z]
- メリット ; 未知の単語にも対応できる
- デメリット ; トレーニングコストがかかる

(RNN W1L08) Vanishing gradients with RNNs

内容

gradient clipping ; 閾値を越えたときに, gradient を clip する

(RNN W1L09) Gated Recurrent Unit (GRU)

内容

GRU (simplified)


c = \textrm{memory cell}\\
c^{<t>} = a^{<t>} \\
\tilde{c}^{<t>} = \tanh\left( W_c \left[ c^{<t-1>}, X^{<t>}\right]  + b_c\right) \\
\Gamma_u = \sigma\left(W_u \left[ c^{<t-1>}, X^{<t>}\right]  + b_u\right) \\
c^{<t>} = \Gamma_u \ast \tilde{c}^{<t>} + \left( 1-\Gamma_u \right) \ast \tilde{c}^{<t-1>}

gradient が 0 に近付いても，memory cell が維持される

Full GRU


c^{<t>} = a^{<t>} \\
\tilde{c}^{<t>} = \tanh\left( W_c \left[ \Gamma_r \ast c^{<t-1>}, X^{<t>}\right]  + b_c\right) \\
\Gamma_u = \sigma\left(W_u \left[ c^{<t-1>}, X^{<t>}\right]  + b_u\right) \\
\Gamma_r = \sigma\left(W_r \left[ c^{<t-1>}, X^{<t>}\right]  + b_c\right) \\
c^{<t>} = \Gamma_u \ast \tilde{c}^{<t>} + \left( 1-\Gamma_u \right) \ast \tilde{c}^{<t-1>}

感想

実はよく分かっていない

(RNN W1L10) LSTM (long short term memory) unit

内容

\tilde{c}^{<t>} =\tanh \left( W_c \left[ a^{<t-1>}, X^{<t>} \right] + b_c \right) \\
\Gamma_u = \sigma\left( W_u \left[ a^{<t-1>}, X^{<t>} \right] + b_u \right) \\
\Gamma_f = \sigma\left( W_f \left[ a^{<t-1>}, X^{<t>} \right] + b_f \right) \\
\Gamma_o = \sigma\left( W_o \left[ a^{<t-1>}, X^{<t>} \right] + b_o \right) \\
c^{<t>} = \Gamma_u \ast \tilde{c}^{<t>} + \Gamma_f \ast c^{<t-1>} \\
a^{<t>} = \Gamma_o \ast \tanh c^{<t>}

suffix
- u ; update
- f ; forget
- o ; output

(RNN W1L11) Bidirectional RNN

内容

Getting information from the future

\hat{y}^{<t>} = g\left( W_y \left[ \overrightarrow{a}^{<t>} , \overleftarrow{a}^{<t>} \right]　 + b_y　\right)

$\overleftarrow{a}^{<t>}$ が，未来からの情報

所感

反対向きのベクトル矢印 ($\overleftarrow{a}$) が書けたのが嬉しい

(RNN W1L12) Deep RNNs

内容

$a^{[l]<t>}$ ; $t$ において，layer $l$ の activation
例えば ...

a^{[2]<3>} = g\left( W_a^{[2]} \left[  a^{[2]<2>}, a^{[1]<3>} \right] + b_a^{[2]} \right)

参考

Deep Learning Specialization (Coursera) 自習記録 (目次)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up