More than 5 years have passed since last update.

Deep Learning Specialization (Coursera) 自習記録 (C4W2)

Last updated at 2020-07-05Posted at 2020-07-03

はじめに

Deep Learning Specialization の Course 4, Week 2 (C4W2) の内容です。

(C4W2L01) Why look at case studies?

内容

今週の outline
- Classic Networks
  - LeNet-5
  - AlexNet
  - VGG
- ResNet (152 layers)
- Inception

(C4W2L02) Classic Networks

内容

LeNet-5 (1998 年)

Input ; 32 x 32 x 1
CONV (5x5, s=1) ; 28 x 28 x 1
Avg POOL (f=2, s=2) ; 14 x 14 x 6
CONV (5x5, s=1) ; 10 x 10 x 16
Avg POOL (f=2, s=2) ; 5 x 5 x 6
FC ; 120 パラメータ
FC ; 84 パラメータ
$\hat{y}$

パラメータ数；60k
$n_H$, $n_W$ が小さくなり，$n_C$ が大きくなる
CONV, POOL, CONV, POOL, FC, FC という典型的なネットワーク

AlexNet (2012)

Input ; 227x227x3
CONV (11x11, s=4) ; 55 x 55 x 96
Max POOL (3x3, s=2) ; 27 x 27 x 96
CONV (5x5, same) ; 27 x 27 x 256
Max POOL (3x3, s=2) ; 13 x 13 x 256
CONV (3x3, same) ; 13 x 13 x 384
CONV (3x3, same) ; 13 x 13 x 384
CONV (3x3, same) ; 13 x 13 x 256
Max POOL (3x3, s=2) ; 6 x 6 x 256
FC ; 4096 パラメータ
FC ; 4096 パラメータ
softmax ; 1000 パラメータ

Similarity to LeNet, but much bigger ($\sim$ 60M parameters)
ReLU
Multiple GPUs
Local Response Normalization

VGG-16

CONV = 3x3 filter, s=1, same
Max POOL = 2x2, s=2

Input ; 224 x 224 x 3
CONV64 x 2 ; 224 x 224 x 64
POOL ; 112 x 112 x 64
CONV128 x 2 ; 112 x 112 x 128
POOL ; 56 x 56 x 128
CONV256 x 3; 56 x 56 x 256
POOL ; 28 x 28 x 256
CONV512 x 3; 28 x 28 x 512
POOL ; 14 x 14 x 512
CONV512 x 3 ; 14 x 14 x 512
POOL ; 7 x 7 x 512
FC ; 4096 パラメータ
FC ; 4096 パラメータ
Softmax ; 1000 パラメータ

パラメータ数 ; $\sim$ 138M
構造の相対的な均一性

感想

2015 年が classic と表現される，時代の速さよ

(C4W2L03) Residual Networks (ResNet)

内容

Residual block
- $a^{[l]}$
- $z^{[l+1]} = W^{[l+1]}a^{[l]} + b^{[l+1]}$
- $a^{[l+1]} = g(z^{[l+1]})$
- $z^{[l+2]} = W^{[l+2]}a^{[l+1]} + b^{[l+2]}$
- $a^{[l+1]} = g(z^{[l+1]} + a^{[l]})$
layer が深くなると，通常のネットワークでは training error が大きくなる
でも ResNet では 100 layers を越えても training error が減少する。深いネットワークの学習に効果的

(C4W2L04) Why ResNets works

内容

もし $W^{[l+2]} = 0$ ， $b^{[l+2]} = 0$ の場合，$a^{[l+2]} = g(a^{[l]}) = a^{[l]}$ となる
Identity function is easy for Residual block to learn (Residual block は恒等関数を学習するのが楽)

(C4W2L05) Network in network and 1x1 convolutions

内容

Why does a 1x1 convolution do?
- $(6 \times 6 \times 32) \ast (1 \times 1 \times 32) = (6 \times 6 \times
  \textrm{#filters})$
- input の各画素に fully connected (FC) を適用するイメージ
- Network in Network とも言われる
Using 1x1 convolutions
- Input ; $28 \times 28 \times 192$
- ReLU, CONV 1x1, 32 filters → Output ; $28 \times 28 \times 32$
- $n_C$ を減らすことができる

(C4W2L06) Inception network motivation

内容

Input (28x28x192) に，以下をそれぞれ適用し，合体させる
- 1x1 → Output ; 28x28x64
- 3x3 → Output ; 28x28x128
- 5x5 → Output ; 28x28x32
- Max POOL → Output ; 28x28x32
- 全部で 28x28x256 になる
様々なフィルタサイズと pooling をすべて適用し，ネットワークに適切なものを選ばせる
The problem of computational cost
- Input ; 28x28x192
- CONV 5x5, same, 32 → Output ; 28x28x32
- 計算コスト ; 28x28x32x5x5x192 = 120M
Using 1x1 convolution
- Input ; 28x28x192
- CONV 1x1, 16 → Output ; 28x28x16 (bottle neck layer)
- CONV 5x5, 32 → Output ; 28x28x32
- 計算コスト ; 28x28x16x192 + 28x28x32x5x5x16 = 12.4M (上記の 1/10)
bottle neck layer を適切に設計すれば，パフォーマンスに影響を与えず，計算量を小さくできる

(C4W2L07) Inception network

内容

bottle neck layer を入れた Inception network の説明
GoogLeNet と呼ばれている

(C4W2L08) Using open-source implementation

内容

GitHub からソースコードをダウンロード (git clone) の説明

(C4W2L09) Transfer Learning

内容

$x$ → layer → layer → $\cdots$ → layer → softmax → $\hat{y}$ において，
- データが少ないときは，softmax だけ学習させる (それ以外のパラメタは固定)
- 大きいデータセットの場合は，例えば後半の layer は学習し，前半の layer は固定する
- 非常に大きいデータがあるときは，ネットワーク全体を学習する

(C4W2L10) Data augumentation

内容

Common augmentation method
- mirroring
- random cropping
- 以下はあまり使われない
  - rotation
  - shearing
  - local warping
color shifting
- R, G, B に数字を足す (or 引く)
- AlexNet の論文に，PCA color augmentation の方法が書いてある
Implementing distortions during training
- 画像 1 枚を読み込んだら，画像を加工 (distortion) して，それらを mini-batch にして train する

(C4W2L11) The state of computer vision

内容

現在のデータ量は感覚的に speech recognition $\gt$ image recognition $\gt$ object detection (物体がどこにあるか認識する)
データがたくさんあれば，シンプルなアルゴリズムで，手調整 (hand-engineering) は少なくて済む
データが少ないときは，手調整 (hand-engineering or hack) が増える
Two sources of knowledge
- labeled data
- hand-engineering features / network architecture / other components
データが少ないときは，Transfer Learning が役立つ
Tips for doing well on benchmark / winning competitions
- Ensemble
  - Training several networks independently and average their output (3 ～ 15 networks)
- Multi-crop at test time
  - Run classifier on multiple versions of test images and average results (10 crops)
Use open source code
- Use architecture of network published in the literature
- Use open source implementations if possible
- Use pre-trained models and fine-tune on your data set

参考

今週の実習は ResNet の実装

参考

Deep Learning Specialization (Coursera) 自習記録 (目次)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up