More than 5 years have passed since last update.

TensorFlow + Kerasでフレンズ識別する - その4: 顔周辺の切り出し編

Last updated at 2017-08-20Posted at 2017-08-19

判定の精度を向上させるため、入力された画像から顔の位置を検出し、その周辺のみの情報を使って判定を行うようにする。

OpenCVとGoogle Cloud Vision API

顔を検出する方法は色々考えられるが、今回は実装の簡単さからOpenCVかGoogle Cloud Vision APIを使うことを検討した。

Library/API	検出成功数(枚)	誤検知数(個)
Cloud Vision API	53 / 71	0
OpenCV(default)	36 / 71	85
OpenCV(alt)	24 / 71	2

手元にあったテスト用のコスプレ画像71枚を使って、検出精度を比較してみたところ、Google Cloud Vision APIが圧倒的に精度が良かったので、今回はGoogle Cloud Vision APIを使うことにした。

Google Cloud Vision APIを使った顔検出

Google Cloud Platformのアカウント設定と認証情報の設定については省略。

from google.cloud import vision
from google.cloud.vision import types

def detect_faces(image_path):
    client = vision.ImageAnnotatorClient()
    with open(image_path, 'rb') as file:
        image = types.Image(content=file.read())
        annotations = client.face_detection(image=image).face_annotations

    face_boxes = []
    for annotation in annotations:
        x_s = [vertex.x for vertex in annotation.bounding_poly.vertices]
        y_s = [vertex.y for vertex in annotation.bounding_poly.vertices]
        face_boxes.append((min(x_s), min(y_s), max(x_s), max(y_s)))
    return face_boxes

Google Cloud Vision APIは、普通にHTTPリクエストを発行して利用することもできるが、pipでインストールできるクライアントライブラリを使うと非常にシンプルに実装できる。

ImageAnnotatorClientのface_detectionを呼び出すと、与えた画像から検出された顔のリストを取得できる。リストに含まれる各アノテーションには、bounding_polyというプロパティがあり、ここに検出された顔の座標が含まれている。名前の通り、ポリゴン(多角形)として表現されているのだが、ここではポリゴンの全頂点のx, yそれぞれの最小値と最大値を求めて、そこからそのポリゴンを含む矩形に変換している。次のステップで画像分類をさせる際に入力させる画像は矩形である必要があるからだ。

顔の周辺(マージンつき)を切り出す

from PIL import Image
...

max_margin = 0.2

faces = detect_faces(image_path)
if len(faces) > 0:
    image = Image.open(image_path).convert("RGB")

    x1, y1, x2, y2 = faces[0]
    w = x2 - x1
    h = y2 - y1

    spaces_x = min(x1, image.width - x2, int(float(w) * max_margin))
    spaces_y = min(y1, image.height - y2, int(float(h) * max_margin))
    margin = min(spaces_x, spaces_y)

    img2 = image.crop((
        x1 - margin,
        y1 - margin,
        x2 + margin,
        y2 + margin
    ))

単純に検出された顔の座標で切り取ってしまうと、コスプレの特徴である髪型や衣装がほとんど含まれず、判定の精度が下がってしまうため、検出された範囲よりも20%ほど広めにクロップを行う。

処理とUI/UXの簡略化のため、顔が複数検出された場合でも、APIが返した最初の顔のみを利用するようにしている。

切り出した顔周辺の画像を使って分類をする

image = img2.resize((image_size, image_size), Image.ANTIALIAS)
image = np.asarray(image, dtype=np.float32).reshape(image_size, image_size, 3)
image /= 255.0

result = model.predict(np.array([image]), 1)[0]
for i, score in enumerate(result):
    print("Category-%d: %.2f"%(i, score))

あとは、前回同様の方法で、学習済みモデルを使って分類を行う。

Webサービス化へ

ここまでで、とりあえず入力した画像についてそこそこの精度でフレンズ判定をできるようなプログラムのベースができたので、次はFlaskを使ってWebサービス化をしてみる。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up