More than 3 years have passed since last update.

Deep Learning Specialization (Coursera) 自習記録 (C4W3)

Last updated at 2020-07-19Posted at 2020-07-05

はじめに

Deep Learning Specialization の Course 4, Week 3 (C4W3) の内容です。

What are localization and detection?
- classification ; 何か?
- localization ; どこにあるか? (1 object)
- detection; どこにあるか? (multiple objects)
localization の output
- classification の出力 (何か?)
- bounding box ($b_x, b_y, b_h, b_w$)
locatiozation の output をより具体的に
- $p_c$ ; is there any object? (yes; 1, no; 0)
- $b_x$
- $b_y$
- $b_h$
- $b_w$
- $c_1$ ; is it pedestrian?
- $c_2$ ; is it car?
- $c_3$ ; is it motorcycle?
$p_c = 0$ (何もない) ときは，それ以外のパラメタは不定
Loss fuction ; $L(\hat{y}, y)$
- $\sum_{i=1}^{8} (\hat{y}_i - y_i)^2$ (if y1=1)
- $(\hat{y}_1 - y_1)^2$ (if y1=0)

2 つの bounding box が overlap しているとき，両者の積 (intersection) を両社の和 (union) で割ったものを IoU という
IoU $\ge$ 0.5 なら習慣的に bounding box は correct と判断する

各物体を 1 個だけ認識することを保証するアルゴリズム
discard all boxes with $p_c \le 0.6$
while there are any remaining boxes:
- pick the box with the largest $p_c$, output that as a prediction
- discard any remaining box with IoU $\ge 0.5$ with the box output in the previous step

1 つのセルで複数の物体を認識したいときに，2 つの anchor box を設定する
Previously:
- Each object for training image is assigned to grid sell that contains that object's mid-point
With two anchor box:
- Each object in training image is assigned to grid cell that contains object's mid-point and anchor box for the grid cell with highest IoU

$y$ ; 3 x 3 x 2 (#anchor) x 8 (5 + #classes)
- 3 x 3 のグリッドを想定したが，一般的には 19 x 19 とか
- 5 + #classes ; $p_c$, $b_x$, $b_y$, $b_h$, $b_w$, $c_1$, $c_2$, $c_3$
Outputting the non-max suppression output
- For each grid call, get 2 prediction bounding box
- Get rid of low probability predictions
- For each class (pedestrian, car, motorcycle), use non-max suppression to generate final predictions

Region Proposal : R-CNN
- sliding window をすべてに適用させない
- 領域分割して，何かありそうなところだけ適用する
- Segmentational algorithm ($\sim$ 2000)
R-CNN ; Propose regions. Classify proposed regions once at a time. Output label + bounding box.
Fast R-CNN ; Propose regions. Use convolution implementation of sliding window to classify all the proposed regions.
Faster R-CNN ; Use convolutional network to propose regions.