0
0

More than 3 years have passed since last update.

Deep Learning Specialization (Coursera) 自習記録 (RNN W1)

Last updated at Posted at 2020-08-10

はじめに

Deep Learning Specialization の RNN, Week 1 (RNN W1) の内容です。

(RNN W1L01) RNN Model - Introduction, Motivation, Why Sequence Model

内容

  • RNN のモデルの説明
    • Outputs from previous step are fed as input to the current step
    • Has hidden state which remembers information about a sequence
    • Have memory
    • $h_t = f(h_{t-1}, X_t)$
  • Sequential data の例
    • 機械翻訳 ; 単語は独立していなくて,前後の単語に依存している
    • Name entity recognition
    • Sentiment classification
    • Word prediction

(RNN W1L02) Sentence / Word Representation

内容

  • notation

    • $X^{(i)}$ ; $i$-th input sentence
    • $X^{(i)<t>}$ ; $t$-th word of $i$-th sentence
    • $T_X$ ; 語数
    • $Y^{(i)}$ ; $i$-th output sentence
  • Word presentation

    • Need to define Dictionary (Vocabulary)
    • Represent each word by One-Hot Encoding (denotes position of a word in the Vocabulary)

(RNN W1L03) RNN Model

内容

  • 普通の network を使うときの課題
    • Inputs, outputs can be different lengths in different examples.
    • Doesn't share feature learned across different position of text
  • RNN の弱点
    • 前の情報は使えるが,後ろの情報を使えない (それを解決するために,Bidirectional RNN (BRNN) がある)

Forword Propagation

a^{<0>} = \vec{0} \\
a^{<t>} = g_1 \left( W_{aa} a^{<t-1>} + W_{ax} X^{<t>} + b_a  \right) \\
\hat{y}^{<t>} = g_2 \left( W_{ya} a^{<t>} + b_y \right)

  • activation function
    • $g_1$ ; $\tanh$ か ReLU
    • $g_2$ ; sigmoid

simplified RNN notation


a^{<t>} = g\left( W_a \left[a^{<t-1>} , X^{<t>} \right] + b_a \right) \\
\hat{y}^{<t>} = g\left( W_y a^{<t>} + b_y  \right)

  • $W_a$ は $W_{aa}$ と $W_{ax}$ を横に並べたもの
  • $\left[a^{<t-1>} , X^{<t>} \right]$ は,2 つのベクトルを縦に並べたもの

(RNN W1L04) Backpropagation through time

内容

L^{<t>}\left( \hat{y}^{<t>}, y^{<t>} \right) = -y^{<t>} \log \hat{y}^{<t>} - \left( 1-y^{<t>}\right) \log \left( 1-\hat{y}^{<t>} \right) \\
L\left( \hat{y}, y \right) = \sum_{t=1}^{T_y} L^{<t>} \left( \hat{y}^{<t>}, y^{<t>} \right)

(RNN W1L05) Different types of RNNs

内容

  • Examples of sequence data
    • Speech recognition
    • Music generation
    • Sentiment classification
    • DNA sequence analysis
    • Machine translation
    • Video activity recognition
    • Name entity recognition
  • Example of RNN architecture
    • one-to-one
    • many-to-one ; sentiment classification
    • one-to-many ; music generation
    • many-to-many ; machine learning

(RNN W1L06) Language model and sequence generation

内容

What is language modelling?

  • Speech recognition
    • P(sentence) = probability

Language modelling with an RNN

  • Training set ; large corpus of English text
  • tokenize
    • <EOS> ; end of sentence
    • <UNK> ; unknown word
  • RNN model
    • $a^{<0>} = \vec{0}$, $X^{<0>} = \vec{0}$
    • $X^{<2>} = y^{<1>}$, $X^{<3>} = y^{<2>}$, $\cdots$ として入力
    • $L(\hat{y}^{<t>}, y^{<t>}) = - \sum_i y_i^{<t>} \log\hat{y}_i^{<t>}$
    • $L = \sum_t L(\hat{y}^{<t>}, y^{<t>})$

(RNN W1L07) Sampling novel sequence

内容

  • Character-level language model
    • 通常は Vocabulary = [a, aane, ... , zulu, <UNK>]
    • Character-level language model は Vocabulary = [a, b, c, ..., X, Y, Z]
    • メリット ; 未知の単語にも対応できる
    • デメリット ; トレーニングコストがかかる

(RNN W1L08) Vanishing gradients with RNNs

内容

  • gradient clipping ; 閾値を越えたときに, gradient を clip する

(RNN W1L09) Gated Recurrent Unit (GRU)

内容

GRU (simplified)


c = \textrm{memory cell}\\
c^{<t>} = a^{<t>} \\
\tilde{c}^{<t>} = \tanh\left( W_c \left[ c^{<t-1>}, X^{<t>}\right]  + b_c\right) \\
\Gamma_u = \sigma\left(W_u \left[ c^{<t-1>}, X^{<t>}\right]  + b_u\right) \\
c^{<t>} = \Gamma_u \ast \tilde{c}^{<t>} + \left( 1-\Gamma_u \right) \ast \tilde{c}^{<t-1>}
  • gradient が 0 に近付いても,memory cell が維持される

Full GRU


c^{<t>} = a^{<t>} \\
\tilde{c}^{<t>} = \tanh\left( W_c \left[ \Gamma_r \ast c^{<t-1>}, X^{<t>}\right]  + b_c\right) \\
\Gamma_u = \sigma\left(W_u \left[ c^{<t-1>}, X^{<t>}\right]  + b_u\right) \\
\Gamma_r = \sigma\left(W_r \left[ c^{<t-1>}, X^{<t>}\right]  + b_c\right) \\
c^{<t>} = \Gamma_u \ast \tilde{c}^{<t>} + \left( 1-\Gamma_u \right) \ast \tilde{c}^{<t-1>}

感想

  • 実はよく分かっていない

(RNN W1L10) LSTM (long short term memory) unit

内容

\tilde{c}^{<t>} =\tanh \left( W_c \left[ a^{<t-1>}, X^{<t>} \right] + b_c \right) \\
\Gamma_u = \sigma\left( W_u \left[ a^{<t-1>}, X^{<t>} \right] + b_u \right) \\
\Gamma_f = \sigma\left( W_f \left[ a^{<t-1>}, X^{<t>} \right] + b_f \right) \\
\Gamma_o = \sigma\left( W_o \left[ a^{<t-1>}, X^{<t>} \right] + b_o \right) \\
c^{<t>} = \Gamma_u \ast \tilde{c}^{<t>} + \Gamma_f \ast c^{<t-1>} \\
a^{<t>} = \Gamma_o \ast \tanh c^{<t>}
  • suffix
    • u ; update
    • f ; forget
    • o ; output

(RNN W1L11) Bidirectional RNN

内容

  • Getting information from the future
\hat{y}^{<t>} = g\left( W_y \left[ \overrightarrow{a}^{<t>} , \overleftarrow{a}^{<t>} \right]  + b_y \right)

  • $\overleftarrow{a}^{<t>}$ が,未来からの情報

所感

  • 反対向きのベクトル矢印 ($\overleftarrow{a}$) が書けたのが嬉しい

(RNN W1L12) Deep RNNs

内容

  • $a^{[l]<t>}$ ; $t$ において,layer $l$ の activation
  • 例えば ...
a^{[2]<3>} = g\left( W_a^{[2]} \left[  a^{[2]<2>}, a^{[1]<3>} \right] + b_a^{[2]} \right)

参考

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0