More than 3 years have passed since last update.

コンピュータービジョン学び直し — Traditional approach to GANs —

Last updated at 2021-01-30Posted at 2021-01-29

□ イントロ

コンピュータービジョンをさらっと学び直した＠Udemyのでここにメモを残しておく。¹
⇓ アルゴリスムと実装の詳細はこちらを参考にされたい。

Deep Learning and Computer Vision A-Z™: OpenCV, SSD & GANs:
Become a Wizard of all the latest Computer Vision tools that exist out there. Detect anything and create powerful apps.

Keywords: OpenCV, PyTorch, SSD, GANs

□ 内容

■ ニューラルネットを使わない物体検出
    ★ The Viola-Jones Algorithm (ヴァイオラ-ジョーンズアルゴリスム) & 実装
 ■ ニューラルネットを使った物体検出
    ★ Single Shot MultiBox Detector (SSD) & 実装
 ■ 画像生成
    ★ Generative Adversarial Networks (GANs) & 実装

■ ニューラルネットを使わない物体検出

★ The Viola-Jones Algorithm (ヴァイオラ-ジョーンズアルゴリスム)

最も基礎的な物体検出のアルゴリスム。Paul ViolaとMichael Jonesにより開発された(2001年)。特徴抽出にHaar-like Featuresを使い、トレーニングコストの削減方法にはAdaboostを採用。

検出したい物体ごとに異なった重みパラメータが必要。実装が超簡単だが、誤判読も多い。レガシーとして知っておく。

**実装例 (OpenCV)**

**1. セットアップ** OpenCVを使って実装する。

shell;setup.sh

pip install opencv-python
# downloading pretrained models
repository=https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/
files=(haarcascade_eye.xml haarcascade_frontalface_default.xml)
for afile in ${files[@]}; do wget -O $afile $repository$afile; done
touch face_detectioin.py

ディレクトリの構造は以下の通り。

tree
your_working_directory/
├── face_detection.py
├── einstein.jpg
├── haarcascade_eye.xml
└── haarcascade_frontalface_default.xml

2. メインコード

face_detection.py

import numpy as np
import cv2
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
img = cv2.imread('einstein.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
    img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),4)
    roi_gray = gray[y:y+h, x:x+w]
    roi_color = img[y:y+h, x:x+w]
    eyes = eye_cascade.detectMultiScale(roi_gray)
    for (ex,ey,ew,eh) in eyes:
        cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),4)

cv2.imshow('Einstein',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

3. 実行 & 結果

python face_detection.py

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/172224/d4ac9c34-d67c-eb16-648d-c201306de7f0.png)

■ ニューラルネットを使った物体検出

★ Single Shot MultiBox Detector (SSD)

入力画像を一度だけ見て物体認識をする。スピーディーな判読が得意。開発者はWei Liu (2015年)。

特徴抽出にMultiBoxを使い、物体のスケーリングには入力画像のリサイズで対応する。

**実装例 (PyTorch)**

PyTorchを使って実装する。[^2] [^2]: [Deep Learning and Computer Vision A-Z™](https://www.udemy.com/course/computer-vision-a-z/learn/lecture/8127602#notes)で紹介されている実装がちょっとOutdatedで複雑に感じたので、後日簡単に実装できるものを探した。ソースコードは[qfgaohao/pytorch-ssd](https://github.com/qfgaohao/pytorch-ssd)から拝借。このソースコードは、モデルのベースにMobileNetを使う(本家はVGGがベース)。

1. セットアップ

setup.sh

git clone https://github.com/qfgaohao/pytorch-ssd
wget -P pytorch-ssd/models https://storage.googleapis.com/models-hao/mobilenet-v1-ssd-mp-0_675.pth
wget -P pytorch-ssd/models https://storage.googleapis.com/models-hao/voc-model-labels.txt

2. 実行 & 結果

python run_ssd_live_demo.py mb1-ssd models/mobilenet-v1-ssd-mp-0_675.pth models/voc-model-labels.txt

**デバグ(おまけ)**

2のデモでerrorが出た場合の対応例を紹介する。 **● カメラのID変更**

open VIDEOIO(AVFOUNDATION): raised unknown C++ exception!

のerrorに対して、以下のようにファイルを書き換え。

python;run_ssd_live_demo.py@line21

# cap = cv2.VideoCapture(0) before
cap = cv2.VideoCapture(1) #after

● データタイプの変更(Tensor → Int)

cv2.rectangle(orig_image, (box[0], box[1]), (box[2], box[3]), (255, 255, 0), 4)
TypeError: function takes exactly 4 arguments (2 given)

のerrorに対して、以下のようにファイルを書き換え。

python;run_ssd_live_demo.py@line77&line80

# cv2.rectangle(orig_image, (box[0], box[1]), (box[2], box[3]), (255, 255, 0), 4) before
cv2.rectangle(orig_image, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (255, 255, 0), 4) # after
~~
# (box[0]+20, box[1]+40), before
(int(box[0])+20), int(box[1])+40), # after

■ 画像生成

★ Generative Adversarial Networks (GANs, 敵対的生成ネットワーク)

コンピューターも創造的になれることを示したブレイクスルー的なアルゴリスム。多数の学習画像から、新しい画像を生成させることができる。Ian Goodfellowによって開発された(2014年)。

**実装例 (PyTorch)**

PyTorchを使って実装する。ソースコードは[Deep Learning and Computer Vision A-Z™](https://www.udemy.com/course/computer-vision-a-z/learn/lecture/8127602#notes)の他に、[DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html)も参考にする。学習に使うデータはチュートリアルのように[CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html)や[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)を使うか、自分で用意する。[^3]