More than 1 year has passed since last update.

NPO法人AI開発推進協会Advent Calendar 2022

@keisuke-okb(Keisuke Okubo)in

NPO法人AI開発推進協会

【実装あり】Pythonで顔検出するライブラリをまとめてみた

Last updated at 2023-03-08Posted at 2022-11-27

目的処理

カメラで撮影した写真（ここでは1人を想定）から、顔の部分だけを抽出し他のAI（StyleGANなどの画像生成AI等）に応用することを考えています。

この記事では、以下の３種類のライブラリを用いた独自の顔検出プログラムをまとめました。

ライブラリの導入に関しては様々な記事があるため、ここでは割愛させていただきます。

利用するライブラリの比較

それぞれいずれかを使うことで、写真を顔検出し、検出した顔の部分を切り取って画像に保存することができます。

CPUを利用することを前提としています。

ライブラリ名	検出方法	精度	検出速度	導入コスト	特徴
Dlib	ランドマーク	〇高	△遅＊	△中	利用するランドマーク検出モデルによって顔の細かいパーツ（68、5か所モデル等）を認識可能。顔検出においては文献が多く実装しやすい反面、GPUを利用する際はコンパイルや、パッケージの依存関係に注意が必要。
OpenCVカスケード分類器	バウンディングボックス	△中	◎速	◎低	PythonのOpenCVライブラリを用いて直感的に実装可能。XMLを用いた軽量カスケード分類器のため、検出速度は速い分精度は他と比べると低い。
InsightFace	ランドマーク	◎高	◎速	△中	ランドマーク検出可能。Dlibに比べ、高解像度のまま高速に検出できる。片目が見えなくなるほど角度の付いた横顔でも検出可能。ピンボケしていても検出できる。Dlib同様、パッケージの依存関係には注意。

＊検出速度について
Dlibは、入力画像の解像度が上がるほど顕著に検出速度が遅くなってしまいます。事前に画像をダウンサンプリングしておくことで検出を高速化できます。

私の環境で試した際の感覚ですのであくまでご参考になればと思います。

顔検出に利用した画像

写真素材サイト「ぱくたそ」の画像を利用しました。クロップ機能を使い、入力画像のアスペクト比や解像度はまちまちです。
正面を向いている人の検出を想定しているので、横顔の写真は利用していません。

眼鏡やサングラス、帽子を身に着けた方、コスプレをしいる方の写真を利用しています。

実現する機能

検出した顔の傾きを補正する
どんな環境で、どんな人が撮影しても、正面を向いていればなるべく同じ画角になるように写真を正方形に切り取る

顔検出のロジック

ライブラリ共通で、以下のような流れで顔検出、顔部分の抽出を行います。

OpenCVにおいては、「顔検出器」と「目検出器」を組み合わせて利用します。

それぞれの目のランドマークの重心を算出→目の位置として利用
口のランドマークの重心を算出→口の位置として利用
左右の目の位置の横方向と縦方向の差をそれぞれ$w, h$とする
目の傾きを算出 $$ \theta = \arctan{\frac{h}{w}} $$
写真の中央を原点として、写真を$\theta$だけ回転（アフィン変換）

ランドマーク座標を$\theta$だけ回転（def rotate(coordinates, theta_rad, h, w)）
- 画像座標系で示されたランドマーク座標$(x, y)$を中心座標系$(x_0, y_0)$に変換
  $$ x_0 = x - \frac{w}{2} $$
  $$ y_0 = -y - \frac{h}{2} $$
- $\theta$に合わせてランドマーク座標$(x_0, y_0)$を回転
  $$ x_0' = x_0\cos{\theta} - y_0\sin{\theta} $$
  $$ y_0' = x_0\sin{\theta} + y_0\cos{\theta} $$
- 回転後のランドマーク座標を画像座標系に変換
  $$ x^* = x_0' + \frac{w}{2} $$
  $$ y^* = -y_0' + \frac{h}{2} $$
- 画像座標系で示された回転後のランドマーク座標$(x*, y*)$を返す

目２か所＋口の重心を求め、顔の中心位置とする（y_center_face, x_center_face）
目２か所の幅から、切り出しの画像幅を求める（crop_width）
縦方向の位置修正パラメータを求める（h_adjast）
写真の切り抜き実行

1. Dlib

検出モデルのクラス定義、傾き回転関数定義

StyleGANの顔検出の実装（Dlibを利用している部分）を参考にしています。

class DlibLandmarksDetector:
    def __init__(self, predictor_model_path):
        self.detector = dlib.get_frontal_face_detector()
        self.shape_predictor = dlib.shape_predictor(landmarks_model_path)

    def get_landmarks(self, image):
        img = dlib.load_rgb_image(image)
        detection = self.detector(img, 1)[0]
        face_landmarks = [np.array([item.y, item.x]) for item in self.shape_predictor(img, detection).parts()]
        return face_landmarks


def rotate(coordinates, theta_rad, h, w):
    x = coordinates[1]
    y = coordinates[0]
    x = x - w / 2
    y = -y + h / 2
    x_ = x * np.cos(theta_rad) - y * np.sin(theta_rad)
    y_ = x * np.sin(theta_rad) + y * np.cos(theta_rad)
    x = x_ + w / 2
    y = -y_ + h / 2
    return np.array([y, x])

顔検出、切り出し実行

img_path：入力画像のパス
landmarks_model_path：検出器モデルのパス
dst_file：切り出し画像保存先のパス

landmarks_detector = DlibLandmarksDetector(landmarks_model_path)
face_landmarks = landmarks_detector.get_landmarks(img_path) # ランドマーク検出

lm = np.array(face_landmarks)
lm_chin          = lm[0  : 17] 
lm_eye_left      = lm[36 : 42]
lm_eye_right     = lm[42 : 48]
lm_mouth_outer   = lm[48 : 60]

# 各パーツの位置の平均を算出
eye_left     = np.mean(lm_eye_left, axis=0)
eye_right    = np.mean(lm_eye_right, axis=0)
eye_avg      = (eye_left + eye_right) * 0.5
eye_to_eye   = eye_right - eye_left
mouth_left   = lm_mouth_outer[0]
mouth_right  = lm_mouth_outer[6]
mouth_avg    = (mouth_left + mouth_right) * 0.5

img = cv2.imread(src_file)
w = img.shape[1]
h = img.shape[0]
# img = PIL.Image.open(src_file)

# 目の傾き補正
w_ = eye_to_eye[1]
h_ = eye_to_eye[0]
tan_rad = np.arctan(h_ / w_)
tan_deg = np.rad2deg(tan_rad)

center = (int(img.shape[1]/2), int(img.shape[0]/2))
trans = cv2.getRotationMatrix2D(center, tan_deg, 1)
img = cv2.warpAffine(img, trans, (img.shape[1],img.shape[0]))

eye_left = rotate(eye_left, tan_rad, h, w)
eye_right = rotate(eye_right, tan_rad, h, w)
mouth_avg = rotate(mouth_avg, tan_rad, h, w)

y_center_face, x_center_face = np.mean([eye_left, eye_right, mouth_avg], axis=0)
crop_width = (eye_right - eye_left)[1] * 4.1
h_adjast = - 0.05 * crop_width

img_crop = img[
    int(y_center_face-crop_width/2+h_adjast) : int(y_center_face+crop_width/2+h_adjast),
    int(x_center_face-crop_width/2) : int(x_center_face+crop_width/2)
]

img_crop = cv2.resize(img_crop, dsize=(output_size, output_size))
cv2.imwrite(dst_file, img_crop)

顔検出結果

※検出できなかった場合黒塗りの画像を出力しています。

入力画像

切り出し結果

魔女コスプレの方の写真1枚をのぞき、傾きを補正してうまく検出できています。

2. OpenCVカスケード分類器

顔の傾きに影響して顔検出のバウンディングボックスが大きく変化してしまったため、目の位置を用いて補正した後再度顔検出を行っています。

検出モデルの定義

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

顔検出、切り出し実行

img_path：入力画像のパス
landmarks_model_path：検出器モデルのパス
dst_file：切り出し画像保存先のパス

img = cv2.imread(raw_img_path)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)


# ===== 初期検出実行 ======
faces = face_cascade.detectMultiScale(img_gray, minSize=(200,200), minNeighbors=2)
eyes = eye_cascade.detectMultiScale(img_gray, minSize=(50,50))
# =========================

# 検出されなかった場合の閾値緩和
if len(faces) == 0:
    faces = face_cascade.detectMultiScale(img_gray, minSize=(100,100), minNeighbors=1)

# 目の位置を検出することによる傾き補正
if len(eyes) == 2: # 目を合計３個以上誤検出した場合はスキップ
    
    w_ = abs((eyes[0][0] + eyes[0][2] / 2) - (eyes[1][0] + eyes[1][2] / 2))
    h_ = abs((eyes[0][1] + eyes[0][2] / 2) - (eyes[1][1] + eyes[1][2] / 2)) 
    tan_rad = np.arctan(h_ / w_)
    tan_deg = np.rad2deg(tan_rad)
    
    if 5 <= tan_deg < 30: # 顔が傾いていると判別する角度の判定
        center = (int(img_gray.shape[1]/2), int(img_gray.shape[0]/2))
        trans = cv2.getRotationMatrix2D(center, tan_deg, 1)
        img = cv2.warpAffine(img, trans, (img_gray.shape[1],img_gray.shape[0]))
        img_gray = cv2.warpAffine(img_gray, trans, (img_gray.shape[1],img_gray.shape[0]))

        # 傾き補正後再検出
        faces_ = face_cascade.detectMultiScale(img_gray, minSize=(100,100), minNeighbors=2)
        
        if len(faces_) > 0:
            faces = faces_

# ========================================

(x, y, w, _) = faces[-1] # 複数検出時の対策

pad_width = int(w / 3.5) # パディング量の調整
h_adjast = int(w / 10) # 縦位置調整

# 目の位置を考慮したトリミング調整
eyes_distance = 0
if len(eyes) == 2:
    eyes_distance = abs((eyes[0][0] + eyes[0][2] / 2) - (eyes[1][0] + eyes[1][2] / 2))

    if int(eyes_distance * 2.5) < w:
        w_new = int(eyes_distance * 2.5)

        x = int(x + (w-w_new) / 2)
        y = int(y + (w-w_new) / 2)
        w = w_new

if x - pad_width < 0:
    pad_width = x
if y - pad_width < 0:
    pad_width = y
if y - pad_width - h_adjast < 0:
    h_adjast = 0

face_crop = img[y-pad_width-h_adjast:y+w+pad_width-h_adjast, x-pad_width:x+w+pad_width]
face_crop = cv2.resize(face_crop, dsize=(1024, 1024))

cv2.imwrite(dst_file, face_crop)

顔検出結果

※検出できなかった場合黒塗りの画像を出力しています。

入力画像

切り出し結果

顔の検出そのものは1枚をのぞきできていますが、うまく顔の傾きを補正できていないものがいくつかありました。

3. InsightFace

検出モデルの定義、傾き回転関数定義

from insightface.app import FaceAnalysis
InsightFaceLandmarksDetector = FaceAnalysis(name="buffalo_sc") # 最も軽量モデルを利用
InsightFaceLandmarksDetector.prepare(ctx_id=0, det_size=(640, 640))


def rotate(coordinates, theta_rad, h, w):
    x = coordinates[1]
    y = coordinates[0]
    x = x - w / 2
    y = -y + h / 2
    x_ = x * np.cos(theta_rad) - y * np.sin(theta_rad)
    y_ = x * np.sin(theta_rad) + y * np.cos(theta_rad)
    x = x_ + w / 2
    y = -y_ + h / 2
    return np.array([y, x])

顔検出、切り出し実行

img_path：入力画像のパス
dst_file：切り出し画像保存先のパス

img = cv2.imread(img_path)
face_landmarks = InsightFaceLandmarksDetector.get(np.asarray(img)) # ランドマーク検出

lm = np.fliplr(np.array(face_landmarks)) # w, h の順を入れ替える
lm_eye_left      = lm[0]
lm_eye_right     = lm[1]
lm_nose          = lm[2]
lm_mouse         = lm[3:]

# 各パーツの位置の平均を算出
eye_left     = lm_eye_left
eye_right    = lm_eye_right
eye_avg      = (eye_left + eye_right) * 0.5
eye_to_eye   = eye_right - eye_left
nose         = lm_nose
mouse        = np.mean(lm_mouse, axis=0)

w = img.shape[1]
h = img.shape[0]

# 目の傾き補正
w_ = eye_to_eye[1]
h_ = eye_to_eye[0]
tan_rad = np.arctan(h_ / w_)
tan_deg = np.rad2deg(tan_rad)

center = (int(img.shape[1]/2), int(img.shape[0]/2))
trans = cv2.getRotationMatrix2D(center, tan_deg, 1)
img = cv2.warpAffine(img, trans, (img.shape[1],img.shape[0]))

eye_left = rotate(eye_left, tan_rad, h, w)
eye_right = rotate(eye_right, tan_rad, h, w)
mouth_avg = rotate(mouth_avg, tan_rad, h, w)

y_center_face, x_center_face = np.mean([eye_left, eye_right, mouth_avg], axis=0)
crop_width = (eye_right - eye_left)[1] * 4.1
h_adjast = - 0.05 * crop_width

img_crop = img[
    int(y_center_face-crop_width/2+h_adjast) : int(y_center_face+crop_width/2+h_adjast),
    int(x_center_face-crop_width/2) : int(x_center_face+crop_width/2)
]

img_crop = cv2.resize(img_crop, dsize=(output_size, output_size))
cv2.imwrite(dst_file, img_crop)

顔検出結果

※検出できなかった場合黒塗りの画像を出力しています。

入力画像

切り出し結果

傾きを補正しつつ、すべての写真において顔検出に成功しました。

おわりに

顔検出ができる３つのライブラリを用いて顔検出の比較を行いました。顔検出をしたいと考えている方の参考になれば幸いです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up