More than 3 years have passed since last update.

OpenCVに含まれる深層学習ベースの顔検出

Last updated at 2022-04-26Posted at 2017-12-26

2022年時点で利用可能なOpenCVの顔検出・顔照合のサンプルへの記事があります。

記述が古くなっていることに気づきました。
リンクが壊れています。

OpenCV のdnnの配下のスクリプトが大幅に変わったようです。

# This script is used to estimate an accuracy of different face detection models.
# COCO evaluation tool is used to compute an accuracy metrics (Average Precision).
# Script works with different face detection datasets.

最近 OpenCVの中に深層学習ベースの顔検出があることに気づいた。

resnet_ssd_face_python.py
https://github.com/opencv/opencv/blob/master/samples/dnn/resnet_ssd_face_python.py

実行例
次のように打ち込むと、接続されているUSBカメラを入力として、顔検出が動作した。

$ cd opencv/samples/dnn
$ python resnet_ssd_face_python.py

私のパソコンの環境だと
１フレームで30 ms 程度で動作した。

使用しているモデルは、次の通り。

prototxt = 'face_detector/deploy.prototxt'
caffemodel = 'face_detector/res10_300x300_ssd_iter_140000.caffemodel'

Net cv::dnn::readNetFromCaffe(const String& prototxt, const String& caffeModel = String())

void cv::dnn::Net::setInput(const Mat& blob, const String& name = "" )

Mat cv::dnn::Net::forward( const String& outputName = String())

に対応する部分が次のように簡潔になっています。

from cv2 import dnn

net = dnn.readNetFromCaffe(prototxt, caffemodel)

net.setInput(dnn.blobFromImage(frame, 1.0, (inWidth, inHeight), (104.0, 177.0, 123.0), False, False))
detections = net.forward()

setInput()
で入れる値は、4-dimensional blob from image(画像から得られる4次元のバイナリ・ラージ・オブジェクト）に変換したものが入るらしい。

>>> help(cv2.dnn.blobFromImage)
Help on built-in function blobFromImage:

blobFromImage(...)
    blobFromImage(image[, scalefactor[, size[, mean[, swapRB[, crop]]]]]) -> retval
    .   @brief Creates 4-dimensional blob from image. Optionally resizes and crops @p image from center,
    .   *  subtract @p mean values, scales values by @p scalefactor, swap Blue and Red channels.
    .   *  @param image input image (with 1-, 3- or 4-channels).
    .   *  @param size spatial size for output image
    .   *  @param mean scalar with mean values which are subtracted from channels. Values are intended
    .   *  to be in (mean-R, mean-G, mean-B) order if @p image has BGR ordering and @p swapRB is true.
    .   *  @param scalefactor multiplier for @p image values.
    .   *  @param swapRB flag which indicates that swap first and last channels
    .   *  in 3-channel image is necessary.
    .   *  @param crop flag which indicates whether image will be cropped after resize or not
    .   *  @details if @p crop is true, input image is resized so one side after resize is equal to corresponing
    .   *  dimension in @p size and another one is equal or larger. Then, crop from the center is performed.
    .   *  If @p crop is false, direct resize without cropping and preserving aspect ratio is performed.
    .   *  @returns 4-dimansional Mat with NCHW dimensions order.

>>>

上記の顔検出を評価するgithub上のスクリプト（第三者による）

~~https://github.com/KatsunoriWa/eval_resnet_ssd_face~~
https://github.com/KatsunoriWa/eval_faceDetectors/tree/master/resnetSSD

resnet_ssd_face_python.py
を改変して、個々の画像に対して検出をさせるサンプルスクリプトが次の場所にある。
しかも、いくつかのデータベースでの顔検出を評価するサンプルスクリプトが用意されている。
jupyter notebookのファイルとして実行できるようになっていて、実行結果をグラフに表示している。
検出率を算出している。
また、画像を面内回転させたときの検出率の変化も評価するようになっている。

評価の中に、誤検出に対する評価がなされていないので、評価の内容としては改善がほしいところである。

Head Pose Image Database

のように顔向きを変えたデータベースでも広い範囲で顔が検出されている。

学習に用いる画像データとアノテーションの与え方については次の説明を読むとよいらしい。

(a)Download original face detection dataset
　この学習済みの結果を得るために使われたデータベース名は明らかにされていません
(b)Convert annotation to the PASCAL VOC format
　既存のアノテーションをPASCAL VOC format に変換します。
(c)Create LMDB database with images + annotations for training

HaarCascadeやHOGSVMの検出器と違って、検出処理にかかる時間が、対象物（ここでは顔)の個数や、背景の紛らわしさによらないと聞いています。

CaffeでDeep Learning つまずきやすいところを中心に

OpenCVとTensorFlowを使ってリアルタイムに顔から人を識別し、結果を画面に表示する

記事を読んでみると、検出自体はOpenCVのCascadeClassifier を使っていて、人の識別の部分でTensorFlow を使っている記事だった。

resnet に関する記事

ディープラーニング ResNet のヒミツ
 Residual Network(ResNet)の理解とチューニングのベストプラクティス
 Deep Resiual Learning for Image Recognition

github SSD: Single Shot MultiBox Detector

Face detection with OpenCV and Deep Learning from image-part 1

OpenCV 以外の場所にある深層学習ベースの顔検出を別記事にしました。

追記：cv2.dnn.readNetFromTensorflow() の活用

OpenCVには、代表的な深層学習の学習済みモデルを用いて推論する枠組みが用意されている。
そのため、それぞれの代表的な深層学習のフレームワークで用意されているModel Zooの結果を用いて推論することができる。

顔検出の学習済みモデルを探して、そのフレームワークでの学習済みモデルをcv2.dnnにあるライブラリを使って実行すればよい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up