More than 3 years have passed since last update.

Deep Learning Specialization (Coursera) 自習記録 (C4W4)

Last updated at 2020-08-01Posted at 2020-07-19

はじめに

Deep Learning Specialization の Course 4, Week 4 (C4W4) の内容です。

(C4W4L01) What is face recognition?

内容

Verification (1:1)
- Input image, name/ID
- Output whether the input image is that of the claimed person
Recognition (1:K)
- Has a database of K persons
- Get an input image
- Output ID if the image is any of the K persons (or not recognized)

(C4W4L02) One shot learning

内容

Learning from one example to recognize the person again
Learning a similarity function
- $d(img1, img2)$ ; degree of difference between images
- if $d(img1, img2) \le \tau$ ; "same"
- if $d(img1, img2) \gt \tau$ ; "different"

(C4W4L03) Siamese network

内容

CNN で画像を入力 ($x^{(1)}$) し，softmax で認識する前のベクトルを $f(x^{(1)})$ とする (encoding)
$d(x^{(1)}, x^{(2)}) = || f(x^{(1)}) - f(x^{(2)}) ||^2_2$
Parameters of NN define an encoding $f(x^{(i)})$
Learn parameters so that
- If $x^{(i)}$, $x^{(j)}$ are the same person, $|| f(x^{(1)}) - f(x^{(2)}) ||^2$ is small
- If $x^{(i)}$, $x^{(j)}$ are the different persons, $|| f(x^{(1)}) - f(x^{(2)}) ||^2$ is large

(C4W4L04) Triplet loss

内容

1 つの anchor 画像と比べて，似ているか似ていないかを調べる
- A ; Anchor
- P ; Positive 似ている
- N ; Negative 似ていない
- $\alpha$ ; マージン

\| f(A) - f(P) \|^2 - \|f(A) - f(N)\|^2 + \alpha \le 0

Loss function
- Given 3 images ; A, P, N


L(A, P, N) = \max\left( \|f(A) - f(P)\|^2 - \|f(A) - f(N)\||^2 +\alpha , 0 \right) \\
J = \sum_{i=1}^m L(A^{(i)}, P^{(i)}, N^{(i)})

Training set ; 10k pictures of 1k person (1 人あたり 10 枚の画像)
Choosing the triplets ; A, P, N
- During training, if A, P, N are chosen randomly, $d(A, P) + \alpha \le d(A, N)$ is easily satisfied.
- Choose triplets that are hard to train on ($d(A, P) \sim d(A, N)$)
詳しくは FaceNet の論文を見てください
0 から学習するのではなく，学習済みの公開データからスタートするのが良い

(C4W4L05) Face verification and binary classification

内容

2 つの顔 $x^{(i)}$，$x^{(j)}$ から NN で $f(x^{(i)})$，$f(x^{(j)})$ を計算して，予測値 $\hat{y}$ を計算する

\hat{y} = \sigma\left( \sum_{k=1}^{128} w_i \left| f(x^{(i)})_k - f(x^{(j)})_k \right| + b   \right)

シグモイド関数内の絶対値計算の代わりに，下記の数値を用いるときもある ($\chi^2$ similarity)

\frac{\left( f(x^{(i)})_k + f(x^{(j)})_k   \right)^2}{ f(x^{(i)})_k + f(x^{(j)})_k }

データベースの画像に対する $f(x^{(i)})$ は事前に計算しておける。raw 画像を保存せず，計算結果を計算しておけばよい

感想

シグモイド関数の中身の $w_i$ は， $w_k$ ではないか?

(C4W4L06) What is neural style transfer?

内容

例えば写真の画像を，ゴッホの絵の感じに変換すること

感想

声色を変えることも，同じでしょうか?

(C4W4L07) What are deep ConvNets learning

内容

浅い層のユニット ; 細かい部分に反応している (エッジ，テクスチャ)
深い層のユニット ; 大きな部分に反応している (犬の顔，水，など)

(C4W4L08) Cost function

内容

C ; Content (変換させたい画像), S ; Style (どういう雰囲気に変換させるか), G ; Generated image (生成される画像) とすると，cost function は次のようになる


J(G) = \alpha J_{content}(C, G) + \beta J_{style}(S, G)

$\alpha$ と $\beta$ の 2 つのハイパーパラメタがあるのは冗長に見えるが，習慣的に 2 つのハイパーパラメタを使う
Find the generated image G
1. Initiate G randomely (G ; 100 x 100 x 3)
2. Use gradient descent to minimize $J(G)$

G := G - \frac{\partial}{\partial G}J(G)

(C4W4L09) Content cost function

内容

J(G) = \alpha J_{content}(C, G) + \beta J_{style}(S, G)

Say you use hidden layer $l$ to compute content cost
Use pre-trained ConvNet (Eg. VGG network)
Let $a^{[l](C)}$ and $a^{[l](G)}$ be the activation of layer $l$ on the images.
If $a^{[l](C)}$ and $a^{[l](G)}$ are similar, both images have similar content

J_{constant}(C, G) = \frac{1}{2}\| a^{[l](C)} - a^{[l](G)} \|^2

(C4W4L10) Style cost function

内容

layer $l$'s activation が style を表現しているとすると，channel 間の activation の相関 (correlation) が style を表していると定義する
Style matrix
- Let $a_{i,j,k}^{[l]}$ を $(i, j, k)$ での activation とする. 行列 $G^{[l](s)}$ は $n_C^{[l]} \times n_C^{[l]}$ で

G_{KK^\prime}^{[l](s)} = \sum_i^{n_H} \sum_j^{n_W} a_{ijK}^{[l](s)} a_{ijK^\prime}^{[l](s)} \\
G_{KK^\prime}^{[l](G)} = \sum_i^{n_H} \sum_j^{n_W} a_{ijK}^{[l](G)} a_{ijK^\prime}^{[l](G)}

Style cost function

J_{style}^{[l]}(S, G) = \frac{1}{\left(2n_H^{[l]}  n_W^{[l]} n_C^{[l]}\right)^2} \sum_{K} \sum_{K^\prime} \left( G_{KK^\prime}^{[l](s)} - G_{KK^\prime}^{[l](G)} \right)^2 \\
J_{style}(S, G) = \sum_l \lambda^{[l]} J_{style}^{[l]}(S, G)

Cost function

J(G) = \alpha J_{content}(C, G) + \beta J_{style}(S, G)

(C4W4L11) 1D and 3D generalizations of model

内容

1D data の例 ; 心電図など
3D data の例 ; CT スキャン, 映画 (2D + 時間) など

メモ

途中で動画がリピートした感じ

参考

[Deep Learning Specialization (Coursera) 自習記録 (目次)]https://qiita.com/satomshr/items/dc4ae752c558798bbef6)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up