More than 5 years have passed since last update.

【keras】arcfaceのバッチ推論化

Last updated at 2020-04-11Posted at 2020-04-11

概要

基本的にディープラーニングはデータ数がある程度多くないと精度が出ないのはよく知られていますが、N数が少なくても精度が出るのがmetric learningです。
詳細の解説や実装方法については以下の記事を参考にしてみてください。

解説: モダンな深層距離学習 (deep metric learning) 手法: SphereFace, CosFace, ArcFace
keras実装：[Keras]MobileNetV2+ArcFaceを使ってペットボトルを分類してみた!

metric learningの１つであるarcfaceは推論時にCNNのGlobal Average Poolingの後の特徴量ベクトルを使います。
学習データの代表の特徴量ベクトルを基準として、評価データとのcos類似度（多次元ベクトル同士の角度）を計算することで分類します。
cosなので-1～1の値を取り、1であれば角度0、つまりは2つのベクトルの成す角度が0ということになり、2つの特徴量ベクトルは似ているということができます。
ですので、3クラス分類であれば各クラス1つの代表のベクトルを用意してもっともcos類似度が高いものを予測値として出力する使い方になります。

さて、ディープラーニングのモデルを学習させた後はエッジPCで計算させてやりたいことを実現するわけですが、GPUを使い、なおかつ複数枚推論する前提であれば複数枚を一度に推論させたほうがトータルの計算時間は当然速くなります。
通常の多クラス分類であれば複数枚を1度にバッチ推論（この言葉が正しいかわかりませんが・・・）するのはそんなに難しくないと思います。
ただ、cos類似度の場合は少しだけ工夫が必要なので以下のことを書こうと思います。

①arcfaceの推論で使用するcos類似度をバッチ推論する方法
②kerasモデルとバックエンドを組み合わせてcos類似度をkeras（tensorflow）に組み込む

以上です。
arcface自体の学習や代表ベクトルを得る実装は飛ばして記述してありますのでご注意ください。

前提

tensorflow 1.15.0
keras 2.3.1
Python 3.7.6
numpy 1.18.1

core i7
GTX1080ti

①arcfaceの推論で使用するcos類似度をバッチ推論する方法

keras実装：[Keras]MobileNetV2+ArcFaceを使ってペットボトルを分類してみた!
を参考にさせていただいています。

前提となるcos類似度を計算するための関数です。

cosine_similarity.py

def cosine_similarity(x1, x2): 
    if x1.ndim == 1:
        x1 = x1[np.newaxis]
    if x2.ndim == 1:
        x2 = x2[np.newaxis]
    x1_norm = np.linalg.norm(x1, axis=1)
    x2_norm = np.linalg.norm(x2, axis=1)
    cosine_sim = np.dot(x1, x2.T)/(x1_norm*x2_norm+1e-10)
    return cosine_sim

まずは1枚ずつcos類似度を求める方法です。

backend_result.py

from keras.applications.xception import Xception
from keras.preprocessing.image imload_img, img_to_array
import time
import os

imgpath = "適当な複数画像が入っているパス"

# モデルの定義
# 仮にxceptionにGAPをくっつけたモデルを作る
# 出力は(N, 2048)
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)

# 仮の代表のベクトルを定義→10クラスそれぞれ1画像のベクトル
# 今回は適当に全て1のベクトルを定義します
master_vector = np.ones([10, 2048])

# 画像名のリスト
# glob使っても大丈夫です
img_name_list = os.listdir(imgpath)

for i in imglist:
    absname = os.path.join(imgpath, i)  
    img_rgb = load_img(absname, color_mode='rgb', target_size=(299,299))
    img_rgb = img_to_array(img_rgb)
    img_rgb = np.expand_dims(img_rgb, axis=0)
    pred_vector = model.predict(img_rgb)
    #cos類似度を計算する
    #pred_vector=(1, 2048)×master_vector=(10, 2048)→(1, 10)
    cos_sim = cosine_similarity(pred_vector, master_vector)
    pred_class = np.argmax(cos_sim[0])
    print(pred_class)

次には複数画像で一度にcos類似度を求める方法です。
先ほどのcos類似度を計算する関数を少しだけ変えます。
内積を計算するときに先にそれぞれのベクトルの2乗ノルムで割ってから計算しているところが変更です。
np.linalg.normで2乗ノルムを計算する際にkeepdims=Trueにしてますので、例えば

10画像分の特徴ベクトルの形状：(10, 2048)

の場合は

10画像分の特徴ベクトルの2乗ノルムの形状：（10, 1)

となり、普通に割り算できるようになります。
（あとは0で割らないように小さい値を足しておきます）
この方法だとそれぞれの特徴ベクトルの形状によらず計算できるようになります。
（気になる方はベクトルの形状をshapeで確認しながらやってみてください）

cosine_similarity_batch.py

def cosine_similarity_batch(x1, x2): 
    if x1.ndim == 1:
        x1 = x1[np.newaxis]
    if x2.ndim == 1:
        x2 = x2[np.newaxis]
    x1_norm = np.linalg.norm(x1, axis=1, keepdims=True)
    x2_norm = np.linalg.norm(x2, axis=1, keepdims=True)
    #先にそれぞれの特徴ベクトルの2乗ノルム（ベクトルの長さ）で割ってから内積を計算
    cosine_sim = np.dot(x1/(x1_norm+1e-10), (x2/(x2_norm+1e-10)).T)
    return cosine_sim

あとはこれを使ってバッチで推論してから上記の関数で計算すればOKです。

backend_result.py

from keras.applications.xception import Xception
from keras.preprocessing.image imload_img, img_to_array
import time
import os

imgpath = "適当な複数画像が入っているパス"

# モデルの定義
# 仮にxceptionにGAPをくっつけたモデルを作る
# 出力は(N, 2048)
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)

# 仮の代表のベクトルを定義→10クラスそれぞれ1画像のベクトル
# 今回は適当に全て1のベクトルを定義します
master_vector = np.ones([10, 2048])

# 画像名のリスト
# glob使っても大丈夫です
img_name_list = os.listdir(imgpath)

# for文で空のリストに入れてnumpy配列にする
imgs = []
for i in imglist:
    absname = os.path.join(imgpath, i)  
    img_rgb = load_img(absname, color_mode='rgb', target_size=(299,299))
    img_rgb = img_to_array(img_rgb)
    imgs.append(img_rgb)
imgs = np.array(imgs, np.float32)
# 推論して先ほどの関数で計算
# cos類似度を計算する
# pred_vector=(N枚, 2048)×master_vector=(10, 2048)→(N, 10)
pred_vector = model.predict(imgs)
cos_sim = cosine_similarity_batch(pred_vector, master_vecter)
# バッチ数分10個のcos類似度が出てくるのでargmaxはaxis=1で実行
pred_class = np.argmax(cos_sim, axis=1)
print(pred_class)

後は結果を煮るなり焼くなり好きにすれば良いです。

②kerasモデルとバックエンドを組み合わせてcos類似度をkeras（tensorflow）に組み込む

どちらかというこちらが本題となります。
せっかく複数の画像に対して同じ処理を一度に計算するのが得意なkeras(tensorflow)を使っているんだから推論後にnumpyで計算していることをモデルに組み込んだ方が処理が速くなるだろうという目論見です。
と言ってもnumpyで計算していることをkerasバックエンドで実装しなおすだけなのでそこまで難しくはないです。

cosine_simlarity_kerasbackend.py

def cosine_similarity_eval(args):
    x1, x2 = args
    x2 = K.constant(x2)
    x1_norm = K.sqrt(K.sum(K.square(x1), axis=1, keepdims=True))
    x2_norm = K.sqrt(K.sum(K.square(x2), axis=1, keepdims=True))
    cosine_sim = K.dot(x1/(x1_norm+1e-10), K.transpose(x2/(x2_norm+1e-10)))
    return cosine_sim

以下、numpy実装との差異です。
・代表のベクトルはnumpyで用意しておいてこの関数内で定数のtensorに変換しています
・ 2乗ノルムの計算がnumpyと同じ関数がなさそうだったので無理やり実装しています
（わかりやすいように先ほどのnumpyの実装と合わせましたがl2_normalizeを使えば1行で書けそうです）
次にkerasのモデルとの接続方法です。

cosine_simlarity_kerasbackend.py

# 特徴ベクトルを出力するモデルを定義します
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)

# 仮の代表のベクトルを定義→10クラスそれぞれ1画像のベクトル
# 今回は適当に全て1のベクトルを定義します
master_vector = np.ones([10, 2048])

# modelのアウトプットである特徴ベクトルのテンソルと代表のベクトルを入力とします
cosine_sim = cosine_similarity_eval([model.output, master_vecter])

これをすればmodelのinputを入力として、cosinesimを出力するtensorflowの計算グラフが完成します。
あとはこれを使ってsess.runでデータを流し込めば終わりです。
以下、全体のコードです。

cosine_simlarity_kerasbackend.py

from keras.applications.xception import Xception
from keras.preprocessing.image imload_img, img_to_array
from keras import backend as K
import time
import os
# セッションを定義
sess = K.get_session()

imgpath = "適当な複数画像が入っているパス"

# モデルの定義
# 仮にxceptionにGAPをくっつけたモデルを作る
# 出力は(N, 2048)
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)

# modelのアウトプットである特徴ベクトルのテンソルと代表のベクトルを入力とします
cosine_sim = cosine_similarity_eval([model.output, master_vecter])

# 仮の代表のベクトルを定義→10クラスそれぞれ1画像のベクトル
# 今回は適当に全て1のベクトルを定義します
master_vector = np.ones([10, 2048])

# 画像名のリスト
# glob使っても大丈夫です
img_name_list = os.listdir(imgpath)

# for文で空のリストに入れてnumpy配列にする
imgs = []
for i in imglist:
    absname = os.path.join(imgpath, i)  
    img_rgb = load_img(absname, color_mode='rgb', target_size=(299,299))
    img_rgb = img_to_array(img_rgb)
    imgs.append(img_rgb)
imgs = np.array(imgs, np.float32)

cos_sim = sess.run(
            cosine_sim,
            feed_dict={
                model.input: imgs,
                K.learning_phase(): 0
            })
# バッチ数分10個のcos類似度が出てくるのでargmaxはaxis=1で実行
# ここもモデルに組み込んでもOKです
pred_class = np.argmax(cos_sim, axis=1)
print(pred_class)

それぞれの推論時間

参考に載せておきます
画像読み込み～推論までの時間です。

方法	時間[s/100枚]
１枚ずつ推論	1.37
numpyでバッチ推論	0.72
kerasでバッチ推論	0.66

ちょい速いくらいですね。
kerasのバックエンドを使った今回のやり方は結構いろいろなところで応用できるので、覚えておいて損はないと思います。

以上です。
不明点、おかしい点ありましたらコメントよろしくお願いします。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up