More than 5 years have passed since last update.

Python+CNNで「似ている芸能人判別デモ」を作った件

Last updated at 2020-05-02Posted at 2020-05-01

画像処理＆機械学習の初歩として、CNNによる顔画像認識を実装してみました。（研究室見学用にデモを同期で作りました）
単に顔画像分類ですが、学習データにあらゆる芸能人（一部例外含む）を用いることで、結果的に「未知の人物について、学習データの人物の中で似ている人を判別する」という感じのデモを作りました。

実装環境

Anaconda 4.7.12
Python 3.7.3

画像収集

まず、顔画像の識別には大量の画像データが必要です。
そこで、Chromeの拡張機能とBing Image Search APIを使って画像を収集しました。

Chromeの拡張機能ですが、Fatkunバッチダウンロードイメージを使いました。
なお、ChromeではGoogleの権利的な影響でしょうか、400~600枚ほどしか集まりませんでした。

Bing Image Search APIを使用するにあたって「Bingの画像検索APIを使って画像を大量に収集する」を参考にしました。
また、コードは「Bing画像検索API v7で画像を収集する」を参考にしました。
収集できたのは、700~900枚ほどでした。

そして、これらをマージし、一人あたり800~1200枚の画像データセットを構築しました。

ちなみに、今回画像収集した芸能人は以下になります。
※人選は一部個人の趣味が影響しています
(名前：Chrome画像枚数＋Bing画像枚数)

俳優
- 吉沢亮：476＋743
- 竹内涼真：384＋919
- 中川大志：633＋721
- 神木隆之介：592＋681
- 田中圭：627＋716
- 横浜流星：604＋808
- 向井理：558＋698
- 松坂桃李：616＋730
- 阿部寛：559＋699
- 香川照之：506＋669
- 大泉洋：572＋769
- 加藤諒：490＋602
- 菅田将暉：377＋783
- 星野源：588＋741
女優
- 浜辺美波：447＋748
- 新垣結衣：437＋748
- 橋本環奈：419＋757
- 石原さとみ：468＋735
- 吉岡里帆：544＋804
- 有村架純：452＋751
- 本田翼：403＋692
- 新木優子：392＋749
- 北川景子：375＋689
- 深田恭子：434＋770
男性アイドル
- 平野紫耀：408＋846
- 山下智久：270＋667
- 手越祐也：274＋682
- 木村拓哉：137＋686
- 西島隆弘：343＋691
女性アイドル
- 白石麻衣：442＋771
- 齋藤飛鳥：516＋770
- 生田絵梨花：391＋806
- 平手友梨奈：480＋768
- 小坂菜緒：409＋841
芸人
- 明石家さんま：499＋722
- みやぞん：502＋948
- 粗品：410＋804
- ゆりやんレトリィバァ：565＋873
- 山里亮太：393＋798
- 小峠英二：483＋741
- 澤部佑：352＋827
政治家
- 安倍晋三：565＋687
- 菅義偉：550＋692
キャラクター
- チコちゃん：407＋734
- アンパンマン：759＋382
- クレヨンしんちゃん：662＋663
- まめっち：566＋778
- Matt：369＋428

ジャニーズはネット上の規制が厳しいためか少なかったり、芸人はネットニュースやコラ・ネタ目的でないとアップしてなさそうなので、まともな顔画像は少なめだったりでした。

（「いや、本来の粗品ぁ〜〜〜！！」と自らツッコんでもらいたい所です→以降「霜降り明星_粗品」としてダウンロードしました）

画像データセットのクリーニング

Webから収集した画像にはどうしてもノイズがあるため、ここで一度画像データセットをクリーニングします。具体的には以下の操作をしました。

拡張子のチェック（実際は収集の際に指定できるので、ソースコードでは例で ".gif" ではないかのみチェック）
重複した画像を除去
PILやOpenCVで画像を開けるか、何かしら操作できるかのチェック

それぞれの人の画像が入っている"人名"のディレクトリと同じ階層で
python cleanup.pyを実行します。

cleanup.py

# !/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import glob
import logging

def check(root, filterGifs = True, filterDuplicates = True, filterCorrupts = True):
    from collections import defaultdict
    from PIL import Image
    import os
    import imagehash
    import cv2

    logging.debug("check {}".format(str(root)))
    for item in os.listdir(root):
        obj = os.path.join(root, item)
        if os.path.isdir(obj):
            check(obj)

    hashtable = defaultdict(lambda: defaultdict(dict))
    pre_am = len(glob.glob(root + "/*.*"))
    for imagePath in glob.glob(root + "/*.*"):
        if os.path.isfile(imagePath):
            # Filter GIFs
            if(filterGifs): 
                logging.debug("filter gifs")
                import re
                if imagePath.endswith(".gif"):
                    try:
                        logging.debug("gif file: {}".format(imagePath))
                        os.remove(imagePath)
                    except:
                        pass
                    continue

                
            # Filter duplicates
            if(filterDuplicates):
                logging.debug("filter duplicates")
                try:
                    image = Image.open(imagePath)
                    logging.info(imagePath)
                    h = str(imagehash.dhash(image))
                    if (not hashtable[h]):
                        hashtable[h] = imagePath
                    else:
                        logging.debug("collision: {} {}".format(imagePath, hashtable[h]))
                        os.remove(imagePath)
                        continue
                except:
                    logging.debug("unreadable file: {}".format(imagePath))
                    try:
                        os.remove(imagePath)
                    except:
                        pass
                    continue

            # Check corrupt files
            # this approach is stupid but it makes sure that all 
            # common imaging libraries can read this file.
            if(filterCorrupts):
                logging.debug("filter corrupts")
                try:
                    image = Image.open(imagePath) # Open image in PIL usually already crashes for corrupt ones
                    imagehash.dhash(image) # Do something

                    image = cv2.imread(imagePath) # Or maybe OpenCV doesn't like it                  
                    cv2.resize(image, (1234, 1234)) # Do something

                    # by now something should have crashed if the file isn't processable!
                except:
                    logging.debug("unreadable file: {}".format(imagePath))
                    try:
                        os.remove(imagePath)
                    except:
                        pass
                    continue

    post_am = len(glob.glob(root + "/*.*"))
    logging.info("deleted {} files for {}".format(pre_am - post_am, str(root)))
    
if __name__ == "__main__":
    dirname=[]
    for i in glob.glob("*"):
        if "." not in i:
            dirname.append(i)

    print(dirname)

    dirname=["白石麻衣","齋藤飛鳥","生田絵梨花"]#<-人名を指定する場合
    
    for name in dirname:
        prefix = name+"/"
        logging.basicConfig(
            format='[%(asctime)s %(levelname)s] %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', level=logging.DEBUG)

        check(prefix, filterGifs=False, filterDuplicates=False, filterCorrupts=False)

顔画像の切り出し

収集した画像から顔領域を認識し、その部分を切り出し、別画像として保存します。
顔認識には、OpenCVのCascade分類器を用いようとしましたが、あまり精度が良さそうではなかったため、HOG (Histogram of Gradients)特徴量によるdlibライブラリを用いることにしました。
Python + dlibで顔検出を行う

そこで、まずはanaconda環境でdlibをインストールします。（ついでにopencvとimutilsも）

conda install -c conda-forge dlib imutils opencv

続いて、"shape_predictor_68_face_landmarks.dat.bz2"をダウンロードし、bunzip2で解凍します。

wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bunzip2 shape_predictor_68_face_landmarks.dat.bz2

そして、以下の"fdet_align.py"と同じディレクトリに"shape_predictor_68_face_landmarks.dat"を置き、
python fdet_align.py (imagedir) (outputdir)
を実行します。
(imagedir)で指定したディレクトリ内の全画像について顔領域を検出し、(outputdir)で指定したディレクトリに切り出し後の顔画像を保存します。

fdet_align.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import cv2
import dlib
from pathlib import Path
import imutils
from imutils.face_utils import FaceAligner
from imutils.face_utils import rect_to_bb

indir = Path(sys.argv[1])
outdir = Path(sys.argv[2])

outdir.mkdir(parents=True, exist_ok=True)

images = []
for img_name in indir.glob("*"):
    if img_name.suffix not in [".jpg", ".png", ".bmp" ,".jpeg"]:
        continue
    images.append(img_name)

face_predictor = "shape_predictor_68_face_landmarks.dat"

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(face_predictor)
fa = FaceAligner(predictor, desiredFaceWidth=256)

print(images)
for img_name in images:
    print(img_name)
    outfile = outdir / ("%s.jpg" % img_name.stem)
    img = cv2.imread(str(img_name))

    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = detector(img_gray, 2)

    if len(faces)==0:
        print("==0",img_name)
        continue
    if len(faces)>1:
        print(">1",img_name)
        continue
    
    (x, y, w, h) = rect_to_bb(faces[0])
    if w == 0 or h == 0:
        continue

    x = max(0, x)
    y = max(0, y)
    w = min(img_gray.shape[1], x + w) - x
    h = min(img_gray.shape[0], y + h) - y


    faceOrig = imutils.resize(img[y:y+h, x:x+w], width=256)
    faceAligned = fa.align(img, img_gray, faces[0])

    cv2.imwrite(str(outfile), faceAligned)
    cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)

顔画像のクレンジング

顔画像を自動で切り出した後、それらのクレンジングをします。
具体的には、切り出し後の顔画像について、顔が良い感じに写っていなかったり、そもそも顔領域の認識に失敗していたりするものを省きました。
自動化するやり方は色々ありそうですが、ひとまず人手でやりました。
顔検出に失敗していた例として、以下のものがありました。これらをひたすら削除していきました。

これは、推しであれば1ミリも苦ではない（むしろ眺めてしまい進まない）のですが、そうでもないとただの苦行です。
徐々に顔画像のゲシュタルト崩壊が起こり、全員正しいのではないか、そもそも誰もその人ではないのではないかという錯覚に陥り出します。

また、「キャラクター」ジャンルはどうしても顔検出がうまくいかず、チコちゃんやアンパンマン、まめっちは皆壊滅的でした。
「チコちゃん」で収集したうち、岡○が写っているものは案の定○村が検出されてました。（ボーッと生きてちゃ怒られてしまいます...）

ですが、なぜか「アンパンマン」の中に紛れ込んでいた「しょくぱんマン」は顔検出できてました。（人間らしい顔をしたキャラということでしょうか）

顔画像のクリーニング＆クレンジング結果

多い順に(画像枚数)

新垣結衣(753)、吉岡里帆(745)、橋本環奈(736)、有村架純(731)、北川景子(725)、松坂桃李(708)、石原さとみ(697)、浜辺美波(695)、新木優子(692)、安倍晋三(692)、横浜流星(665)、深田恭子(662)、中川大志(657)、白石麻衣(619)、吉沢亮(608)、本田翼(601)、阿部寛(598)、菅義偉(594)、ゆりやんレトリィバァ(574)、齋藤飛鳥(570)、菅田将暉(544)、田中圭(537)、竹内涼真(533)、小坂菜緒(532)、生田絵梨花(521)、星野源(486)、平手友梨奈(477)、山下智久(477)、神木隆之介(474)、平野紫耀(473)、加藤諒(468)、手越祐也(453)、香川照之(450)、向井理(435)、みやぞん(433)、明石家さんま(422)、木村拓哉(411)、Matt(409)、大泉洋(408)、西島隆弘(406)、山里亮太(382)、粗品(364)、澤部佑(353)、小峠英二(343)

ご覧の通り、美人は多く、ブス芸人（キムタクやニッシーは除く）は少ないという結果になりました。

学習

「乃木坂メンバーの顔をCNNで分類」を参考に実装しました。

まず、クラス数が増えるとその分精度は下がるため、人数（クラス数）を減らしました。
そこで、上記の画像枚数に基づき、そこそこ枚数がある人を用いて色々試し、
「認識しやすそうな組み合わせの人達（特徴空間内でバラツキが大きそうな(?)人達）」
かつ
「似ていると言われてそんなに嫌じゃない人」←ここ重要
を採用しました。

＜最終的に用いた人物（15名）＞
浜辺美波、吉岡里帆、橋本環奈、有村架純、北川景子、白石麻衣、本田翼、吉沢亮、阿部寛、星野源、香川照之、大泉洋、安倍晋三、菅義偉、田中圭

＊「ガッキーや石原さとみ、深キョンの方が枚数が多いのになんで採用されなかったんだ！💢」というクレームが出そうなので、念のためお断りさせていただくと、
ガッキーにはショートガッキーとロングガッキーが存在し、識別の難易度が上がってしまうためです。（個人的にはショートガッキーの方がタイプではあります）
また、歴が長いため、いくらガッキーとはいえ顔写真の雰囲気が変わってしまっています。（ガッキーは年の流れに抗うと証明したかった...）
そして、同様の理由で、石原さとみと深キョンも今回に限り不採用とさせていただきましたこと、ご了承ください。

また、参照サイト同様、精度評価の際、交差検証しないバージョンと5分割交差検証(5-cross validation)するバージョンとの両方用意しました。
差分としては、early stoppingを導入し、途中でエポックが終了するようにしました。また、ある程度関数化してみました。

画像は一人あたり500枚とし、足りない場合はランダムに再利用することで、無理矢理500枚としました。
ファイル構造としては、'face'というディレクトリ下に、切り出し済みの顔画像が入った"人名"のディレクトリが置かれています。
そこでpython train.pyを実行します。

交差検証しないVer.

train.py

import os
import cv2
import numpy as np
import matplotlib
import glob
import matplotlib.pyplot as plt
import random
from random import choice, sample

def split_trainval(name):
    l=[]
    al=glob.glob("~/face/"+name+"/*")
    for i in al:
        l.append(i)

    imgnum=500 # 画像枚数
    if len(l) < imgnum:
        last = l + [choice(l) for _ in range(imgnum - len(l))]
    else:
        last = l[:imgnum]
    
    random.shuffle(last)

    t_imgnum=400 # train画像枚数
    train=last[:t_imgnum]
    test=last[t_imgnum:]

    return train, test

def setdata(names):
    # 教師データのラベル付け
    X_train = [] 
    Y_train = [] 
    # テストデータのラベル付け
    X_test = [] # 画像データ読み込み
    Y_test = [] # ラベル（名前）
    for i,name in enumerate(names):
        train_split, test_split = split_trainval(name)

        for ij in train_split:
            n=os.path.join("~/face/"+name,ij)
            img = cv2.imread(n)
            b,g,r = cv2.split(img)
            img = cv2.merge([r,g,b])
            X_train.append(img)
            Y_train.append(i)
        for ij in test_split:
            n=os.path.join("~/face/"+name,ij)
            img = cv2.imread(n)
            b,g,r = cv2.split(img)
            img = cv2.merge([r,g,b])
            X_test.append(img)
            Y_test.append(i)

    return X_train, X_test, Y_train, Y_test

            
def model(X_train, X_test, Y_train, Y_test,ln):            
    x_train=np.array(X_train)
    x_test=np.array(X_test)

    from keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
    from keras.models import Sequential
    from keras.utils.np_utils import to_categorical

    y_train = to_categorical(Y_train)
    y_test = to_categorical(Y_test)

    # モデルの定義
    model = Sequential()
    model.add(Conv2D(input_shape=(256, 256, 3), filters=32,kernel_size=(3, 3), 
                     strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32,kernel_size=(3, 3), 
                     strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32,kernel_size=(3, 3), 
                     strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), 
                     strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), 
                     strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(256))
    model.add(Activation("sigmoid"))
    model.add(Dense(128))
    model.add(Activation('sigmoid'))
    model.add(Dense(ln))
    model.add(Activation('softmax'))

    # コンパイル
    model.compile(optimizer='sgd',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    return model, x_train, x_test, y_train, y_test

def last(model, x_train, x_test, y_train, y_test):

    from keras.callbacks import EarlyStopping,ModelCheckpoint
    
    # 学習
    early_stopping = EarlyStopping(monitor='val_loss', patience=10, verbose=1)    
    
    model_checkpoint = ModelCheckpoint(
        filepath='model/BEST_model.h5',
        monitor='val_loss',
        save_best_only=True,
        verbose=1)

    history = model.fit(x_train, y_train, batch_size=32, 
                        epochs=150, verbose=1, validation_data=(x_test, y_test), callbacks=[early_stopping,model_checkpoint])

    vl=history.history["val_loss"]
    va=history.history["val_acc"]
    print('validation loss:',min(vl))
    print('validation accuracy',max(va))
    
    return model, history

def result(model, history, x_test, y_test):
    # 汎化制度の評価・表示
    score = model.evaluate(x_test, y_test, batch_size=32, verbose=0)
    print('validation loss:{0[0]}\nvalidation accuracy:{0[1]}'.format(score))

    # acc, val_accのプロット
    plt.plot(history.history["acc"], label="acc", ls="-", marker="o")
    plt.plot(history.history["val_acc"], label="val_acc", ls="-", marker="x")
    plt.ylabel("accuracy")
    plt.xlabel("epoch")
    plt.legend(loc="best")
    plt.savefig('model/result.png')
    

def train(names):
    # 画像データの用意(train+test)
    X_train, X_test, Y_train, Y_test = setdata(names)

    # モデルを用意
    m, x_train, x_test, y_train, y_test = model(X_train, X_test, Y_train, Y_test, len(names))

    # 学習
    m, history = last(m, x_train, x_test, y_train, y_test)
 
    # 結果表示,グラフ保存
    result(m, history, x_test, y_test) 

if __name__ == "__main__":

    names = ['浜辺美波','吉岡里帆','橋本環奈','有村架純','北川景子','白石麻衣','本田翼','吉沢亮','阿部寛','星野源','香川照之','大泉洋','安倍晋三','菅義偉','田中圭']

    train(names)

学習結果

validation loss: 0.471
validation accuracy: 0.858
accuracy
浜辺美波: 0.87
吉岡里帆: 0.76
橋本環奈: 0.86
有村架純: 0.71
北川景子: 0.87
白石麻衣: 0.82
本田翼: 0.79
吉沢亮: 0.93
阿部寛: 0.94
星野源: 0.85
香川照之: 0.9
大泉洋: 0.89
安倍晋三: 0.96
菅義偉: 0.94
田中圭: 0.78
confusion matrix（縦軸は正解ラベル、横軸は認識結果ラベル）

|人物名　　　　　| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|:-----------|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|
| 1.浜辺美波 |87 | 5 | 1 | 2 | 0 | 1 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2.吉岡里帆 | 6 | 76 | 1 | 7 | 4 | 1 | 1 | 1 | 0 | 1 | 0 | 2 | 0 | 0 | 0 |
| 3.橋本環奈 | 2 | 2 | 86 | 0 | 2 | 2 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
| 4.有村架純 |16 | 2 | 2 | 71 | 1 | 0 | 4 | 3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 5.北川景子 | 3 | 1 | 0 | 0 | 87 | 6 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 6.白石麻衣 | 2 | 0 | 1 | 1 | 10 | 82 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7.本田翼 | 4 | 8 | 3 | 1 | 1 | 1 | 79 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 1 |
| 8.吉沢亮 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 93 | 0 | 1 | 0 | 2 | 0 | 0 | 1 |
| 9.阿部寛 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 94 | 0 | 0 | 0 | 2 | 1 | 0 |
| 10.星野源 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 85 | 2 | 1 | 0 | 0 | 6 |
| 11.香川照之 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 90 | 2 | 1 | 1 | 1 |
| 12.大泉洋 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 2 | 2 | 3 | 89 | 0 | 1 | 0 |
| 13.安倍晋三 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 96 | 2 | 0 |
| 14.菅義偉 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 0 | 94 | 0 |
| 15.田中圭 | 2 | 1 | 1 | 1 | 0 | 1 | 0 | 8 | 1 | 5 | 0 | 2 | 0 | 0 | 78 |

ちゃんとearly stoppingが利いていることが分かります。

5分割交差検証Ver.

果たして本当にこの人物の組み合わせで良いのか、キッチリと検証するために使用しました。
また、交差検証の1回ごとにモデルを保存することで、その中の最良のモデルを用いることができるようにしました。

train_crossval.py

import os
import cv2
import numpy as np
import matplotlib
import glob
import matplotlib.pyplot as plt
import random
from random import choice, sample

from keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from keras.callbacks import EarlyStopping

import copy

# k-crossVal
from sklearn.model_selection import StratifiedKFold


def kfolddata(names):
    X = []
    Y = []
    for index,name in enumerate(names):
        l=[]
        al=glob.glob("~/face/"+name+"/*")
        for i in al:
            l.append(i)

        imgnum=500
        if len(l) < imgnum:
            last = l + [choice(l) for _ in range(imgnum - len(l))]
        else:
            last = l[:imgnum]

        for la in last:
            n=os.path.join("~/face/"+name,la)
            img = cv2.imread(n)
            b,g,r = cv2.split(img)
            img = cv2.merge([r,g,b])
            X.append(img)
            Y.append(index) 

    return X, Y
            
def model(ln):
    model = Sequential()
    model.add(Conv2D(input_shape=(256, 256, 3), filters=32,kernel_size=(3, 3), strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding="same"))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(256))
    model.add(Activation("sigmoid"))
    model.add(Dense(128))
    model.add(Activation('sigmoid'))
    model.add(Dense(ln))
    model.add(Activation('softmax'))

    model.compile(optimizer='sgd', loss='categorical_crossentropy',metrics=['accuracy'])
    
    return model

def prepare_data(X, Y, train, test):
    X_array = copy.deepcopy(X)
    Y = Y[0:len(X)].tolist()

    train = np.concatenate([train,np.arange(len(X),len(Y))], axis=0)

    Y = np.array(Y)
    X_array = np.array(X_array)
    Y_train = Y[train].tolist()
    Y_train = to_categorical(Y_train)
    Y_test = Y[test].tolist()
    Y_test = to_categorical(Y_test)

    return X_array, Y_train, Y_test, train

def last(model, X_array, Y_train, Y_test, train, test):

    early_stopping = EarlyStopping(monitor='val_acc', patience=10, verbose=1) 
    
    history = model.fit(X_array[train], Y_train,
                        batch_size=32,
                        epochs=150,
                        verbose=1,
                        validation_data=(X_array[test], Y_test),
                        callbacks=[early_stopping])
    
    return model, history


def result(model, history, X_array, Y_test, test, cvscores, _id):
    scores = model.evaluate(X_array[test], Y_test, verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)

    plt.subplot(5,1,len(cvscores))
    plt.plot(history.history["acc"], label="acc", ls="-", marker="o")
    plt.plot(history.history["val_acc"], label="val_acc", ls="-", marker="x")
    plt.ylabel("accuracy")
    plt.xlabel("epoch")
    plt.legend(loc="best")
    plt.title("accuracy"+str(len(cvscores)))

    plt.savefig('model/5c_result.png')
    
    model.save("model/"+str(_id)+"_my_model.h5", include_optimizer=False)
    _id+=1

    return cvscores, _id
    

def train(names):

    X,Y=kfolddata(names)
    Y = np.array(Y)

    cvscores = []
    _id=0
    
    kfold = StratifiedKFold(n_splits=5, shuffle=True)
   
    for train,test in kfold.split(X,Y):

        m = model(len(names))

        X_array, Y_train, Y_test, train = prepare_data(X, Y, train, test)

        m, history = last(m, X_array, Y_train, Y_test, train, test)

        cvscores, _id = result(m, history, X_array, Y_test, test, cvscores, _id)
                    
    print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))


    
if __name__ == "__main__":
    names = ['浜辺美波','吉岡里帆','橋本環奈','有村架純','北川景子','白石麻衣','本田翼','吉沢亮','阿部寛','星野源','香川照之','大泉洋','安倍晋三','菅義偉','田中圭']
    
    train(names)

顔検出＆推定

上記顔検出手法を用いてリアルタイムで顔を切り出し、それを用いて推定するようにしました。
保存した学習済みモデルを用いて、python fdet_recog.py --(カメラタイプ)で実行します。

カメラタイプ
- flir : FLIR製カメラ
- webcam ○: Webカメラ（どのカメラを使うか番号を指定）

結果出力画面でスペースキーを押すことで、カメラ映像→顔画像→顔パーツ検出結果画像（認識とは無関係...）→カメラ映像→...となります。

det_rec.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import cv2
import dlib
import numpy as np
from pathlib import Path
from argparse import ArgumentParser
import imutils
from imutils.face_utils import FaceAligner
from imutils.face_utils import rect_to_bb
from tensorflow.keras.models import load_model
from imutils.face_utils import shape_to_np
from imutils.face_utils import visualize_facial_landmarks

from visualizer import draw
from utils import Normalizer


parser = ArgumentParser()
parser.add_argument('--flir', dest='flir', action='store_true')
parser.add_argument("--webcam", help="webcam number", default=0, type=int)
parser.add_argument("--scale", help="image scale", default=0.5, type=int)
args = parser.parse_args()

scale = args.scale

if args.flir:
    import PyCapture2
    # ガンマ値を使って Look up tableを作成
    gamma = 0.1
    lookUpTable = np.empty((1,256), np.uint8)
    for i in range(256):
        lookUpTable[0, i] = np.clip(pow(i / 255.0, gamma) * 255.0, 0, 255)


# False = 普通に顔表示
# True = 顔パーツを重畳（スペースキーで表示変更）
command = False

#### face detector
face_predictor = "shape_predictor_68_face_landmarks.dat"#顔検出モデル

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(face_predictor)
fa = FaceAligner(predictor, desiredFaceWidth=256)

#### loading celebrity prediction model
model = load_model('15-BEST_model.h5')#学習済みモデル

#### camera
if args.flir:
    bus = PyCapture2.BusManager()
    cam = PyCapture2.Camera()
    uid = bus.getCameraFromIndex(0)
    cam.connect(uid)

    numCams = bus.getNumOfCameras()
    print("Number of cameras detected: ", numCams)
    if not numCams:
        print("Insufficient number of cameras. Exiting...")
        exit()
    
    cam.startCapture()
else:
    cam = cv2.VideoCapture(args.webcam)


if args.flir:
    tmp_image = cam.retrieveBuffer()
    bayer_image = np.array(tmp_image.getData(), dtype="uint8").reshape((tmp_image.getRows(), tmp_image.getCols()));
    img = cv2.cvtColor(bayer_image, cv2.COLOR_BAYER_RG2BGR)
    img = cv2.LUT(img, lookUpTable)

else:
    _, img = cam.read()


cv2.namedWindow("image", cv2.WINDOW_NORMAL)
cv2.setWindowProperty("image", cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
faceAligned = (np.random.random((900, 900, 3)) * 255).astype(np.uint8)
faceParts = (np.random.random((900, 900, 3)) * 255).astype(np.uint8)

normalizer = Normalizer(15)  # 15フレームで平滑化する

while True:
    # ranking = model.predict(faceAligned[np.newaxis,:,:,:])
    ranking = [-1, -1, -1]  # = recog(faceAligned)

    if args.flir:
        tmp_image = cam.retrieveBuffer()
        bayer_image = np.array(tmp_image.getData(), dtype="uint8").reshape((tmp_image.getRows(), tmp_image.getCols()));
        img = cv2.cvtColor(bayer_image, cv2.COLOR_BAYER_RG2BGR)
        img = cv2.resize(img, (int(img.shape[1] / 1.5), int(img.shape[0] / 1.5)))
    else:
        _, img = cam.read()

    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    fimg_gray = cv2.resize(img_gray, (int(img_gray.shape[1] * scale) , int(img_gray.shape[0] * scale)))
    faces = detector(fimg_gray, 1)
    s = fimg_gray.shape

    faces = list(sorted(faces, key=lambda f: abs(f.top() - s[0] / 2) + abs(f.left() - s[1] / 2)))

    for i, f in enumerate(faces):
        (x, y, w, h) = rect_to_bb(f)
        x = int(max(0, x) / scale)
        y = int(max(0, y) / scale)
        w = min(img_gray.shape[1], x + int(w / scale)) - x 
        h = min(img_gray.shape[0], y + int(h / scale)) - y

        x -= 10
        y -= 10
        w += 20
        h += 20

        face = dlib.rectangle(top=int(faces[0].top() / scale),
                              bottom=int(faces[0].bottom() / scale),
                              left=int(faces[0].left() / scale),
                              right=int(faces[0].right() / scale))
        if i == 0:
            faceAligned = fa.align(img, img_gray, face)
        img = cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 255), 3)


    if len(faces)==0:
        faceAligned = (np.random.random((900, 900, 3)) * 255).astype(np.uint8)
        faceParts = (np.random.random((900, 900, 3)) * 255).astype(np.uint8)

    else:
        (x, y, w, h) = rect_to_bb(faces[0])
        if w == 0 or h == 0:
            faceAligned = (np.random.random((900, 900, 3)) * 255).astype(np.uint8)
            faceParts = (np.random.random((900, 900, 3)) * 255).astype(np.uint8)

        else:
            x = int(max(0, x) / scale)
            y = int(max(0, y) / scale)
            w = min(img_gray.shape[1], x + int(w / scale)) - x 
            h = min(img_gray.shape[0], y + int(h / scale)) - y

            x -= 10
            y -= 10
            w += 20
            h += 20

            face = dlib.rectangle(top=int(faces[0].top() / scale),
                                  bottom=int(faces[0].bottom() / scale),
                                  left=int(faces[0].left() / scale),
                                  right=int(faces[0].right() / scale))
            img = cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 3)

            # ranking = [-1, -1, -1]  # = recog(faceAligned)
            likelihoods = model.predict(faceAligned[np.newaxis,:,:,:].astype(np.float16))[0]
            temp = np.argsort(-likelihoods)[:9]
            ranking = temp
          
            # 結果を平滑化
            ranking = normalizer(ranking)

            idx = 0
            res_rank = []
            for idx in range(3):
                p = ranking.pop(0)
                while (p != -1) and (p in res_rank):
                    if len(ranking) == 0:
                        p = -1
                        break
                    p = ranking.pop(0)
                res_rank.append(p)
            ranking = res_rank[:3]

            ####faceAligned を使って顔パーツ検出
            faligned = cv2.cvtColor(faceAligned, cv2.COLOR_BGR2GRAY)
            faces = detector(faligned, 1)
            if len(faces) > 0:
                shape = predictor(faligned, faces[0])
                faceParts = visualize_facial_landmarks(faceAligned, shape_to_np(shape))
            else:
                faceParts = faceAligned

    show_window = draw(ranking, img, faceAligned, faceParts, command)

    cv2.imshow("image", show_window)
    key = cv2.waitKey(1)

    if key == ord("q"):
        break
    elif key == ord(" "):
        command = (command + 1) % 3

結果表示のために、以下の画像とネームプレートを用意しました。
（ネームプレートにより、ソーシャルディスタンスが保たれていますね）

結果表示画面では、検出した顔画像と認識結果上位３人を表示しました。
（下図はイメージです。もう少し右にカメラを動かすと結果は変わると思いますが、控えておきます。）

感想

当日は、こんなスライドを出しながら、~~いかにもヲタクな野郎ども~~ワクワクしながら見学に来てくれた後輩達にカメラを向け、勝手に判別していきました。今どきの子は可愛らしい顔立ちの子が多く（？）、女性が上位にランクインすることも多かったですが、それでも嫌な気にはさせていないだろう、と胸を張ることができる認識結果だったと思います。（実際に女子生徒を判別したのは一度だけ...）

ちなみに自分の顔で判別した結果、
1位：吉沢亮
2位：田中圭
3位：星野源
と、狙い通りニンマリすることができました。

また、研究室の教授（60代男性）だと
1位：大泉洋
2位：安倍晋三
3位：菅義偉
と、うまくいっていた模様でした。

そんなこんなで、今年もうちの研究室には「吉沢亮」や「白石麻衣」、「田中圭」に「本田翼」にそっくりの美男美女達が入ってくれました。

参照URL

https://twitter.com/hashtag/うちで踊ろう

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up