More than 3 years have passed since last update.

L2-constrained Softmax Loss for Discriminative Face Verification【2. Related Work】【論文 DeepL 翻訳】

論文読み

Last updated at 2020-11-30Posted at 2020-11-30

この記事は自分用のメモみたいなものです.
ほぼ DeepL 翻訳でお送りします.
間違いがあれば指摘していだだけると嬉しいです.

翻訳元
L2-constrained Softmax Loss for Discriminative Face Verification
Rajeev Ranjan, Carlos D. Castillo, Rama Chellappa

前: 【1. Introduction】
次: 【3. Motivation】

2. Related Work

訳文

近年, ディープラーニング手法を用いた顔認証の精度が大幅に向上している [28, 30, 24, 27, 29, 33]. これらの手法のほとんどは, LFW [14] データセットにおいて, 人間の性能を凌駕しているとさえ言われている. これらの手法は DCNN を用いているが, 学習に用いる損失関数の種類によって異なる. 顔認証では, 正の被験者のペアの特徴が近く, 負の被験者のペアの特徴が遠く離れていることが重要である. この問題を解決するために, 研究者は主に 2 つのアプローチを採用してきた.

最初のアプローチでは, 顔画像のペアを学習アルゴリズムに入力し, 正のペアが近く, 負のペアが遠くになるような特徴埋め込みを学習する. この方向で, Chopra ら [5] は, 学習のために contrastive loss を持つ siamese ネットワークを提案した. Hu ら [13] は, 顔の正のペアと負のペアの間にマージンを持つ識別的なディープメトリックを設計した. FaceNet [28]では, hard triplet 顔サンプルを用いてメトリックを学習するために triplet loss を導入している.

2 つ目のアプローチでは, 顔画像と被写体ラベルを用いて, 分類フレームワークで識別的識別特徴を学習する. 最近の手法 [29, 30, 24, 37] のほとんどは, これらの特徴を学習するためにソフトマックス損失を持つ DCNN を訓練し, これらの特徴は後に顔のペアの類似度スコアを直接計算するか, 識別メトリック埋め込みを訓練するために使用される [27, 3]. もう一つの方法は, ネットワークを共同識別検証タスクのために訓練することである [29, 32, 33]. Xiong ら[36]は, 異なるネットワークとデータセットで訓練された特徴量を 2 段階で融合させる Transferred Deep Feature Fusion (TDFF) を提案している. さらに性能を向上させるために, テンプレート適応 [8] を適用している.

最近のアプローチ [33] では, 顔の特徴量を学習するために center loss を導入している. 我々が提案する手法は, 以下の点で center loss とは異なる. まず, [33] が学習時に center loss とソフトマックス損失を併用しているのに対し, 我々は 1 つの損失関数 ($L_2$-softmax損失) を用いる. 第二に, center loss は学習時に $C \times D$ の追加パラメータを導入する. 一方, $L_2$-softmax 損失は, 特徴量の固定 $L_2$ ノルムを定義する単一のパラメータを導入するだけである. さらに, 中心損失は $L_2$-softmax 損失と組み合わせて使用することもでき, 通常のソフトマックス損失で学習した center loss よりも優れた性能を発揮する (5.1.4節参照).

最近, いくつかのアルゴリズムでは, 性能を向上させるために訓練中に特徴の正規化を使用している. SphereFace [20] は, DCNN が角度的に識別可能な特徴を学習することを可能にする angular softmax (A-softmax) 損失を提案している. DeepVisage [10] と呼ばれる別の手法は, ソフトマックス損失を適用する前に特徴記述子を正規化するバッチ正規化技術の特殊なケースを使用している. 我々の提案する手法は, 特徴記述子に $L_2$ 制約を適用し, 与えられた半径の超球上に存在することを強制している点で異なっている.

原文

In recent years, there has been a significant improvement in the accuracy of face verification using deep learning methods [28, 30, 24, 27, 29, 33]. Most of these methods have even surpassed human performance on the LFW [14] dataset. Although these methods use DCNNs, they differ by the type of loss function they use for training. For face verification, its essential for the positive subjects pair features to be closer and negative subjects pair features far apart. To solve this problem, researchers have adopted two major approaches.

In the first approach, pairs of face images are input to the training algorithm to learn a feature embedding where positive pairs are closer and negative pairs are far apart. In this direction, Chopra et al. [5] proposed siamese networks with contrastive loss for training. Hu et al. [13] designed a discriminative deep metric with a margin between positive and negative face pairs. FaceNet [28] introduces triplet loss to learn the metric using hard triplet face samples.

In the second approach, the face images along with their subject labels are used to learn discriminative identification features in a classification framework. Most of the recent methods [29, 30, 24, 37] train a DCNN with softmax loss to learn these features which are later used either to directly compute the similarity score for a pair of faces or to train a discriminative metric embedding [27, 3]. Another strategy is to train the network for joint identification-verification task [29, 32, 33]. Xiong et al. [36] proposed transferred deep feature fusion (TDFF) which infolves two-stage fusion of features trained with different networks and datasets. Template adaptation [8] is applied to further boost the performance.

A recent approach [33] introduced center loss to learn better discriminative face features. Our proposed method is different from the center loss in the following aspects. First, we use one loss function (i.e., L2-softmax loss) whereas [33] uses center loss jointly with the softmax loss during training. Second, center loss introduces C × D additional parameters during training where C is the number of classes and D is the feature dimension. On the other hand, the L2-softmax loss introduces just a single parameter that defines the fixed L2-norm of the features. Moreover, the center loss can also be used in conjunction with L2softmax loss, which performs better than center loss trained with regular softmax loss (see Section 5.1.4).

Recently, a few algorithms have used feature normalization during training to improve performance. SphereFace [20] proposes angular softmax (A-softmax) loss that enables DCNNs to learn angularly discriminative features. Another method called DeepVisage [10] uses a special case of batch normalization technique to normalize the feature descriptor before applying the softmax loss. Our proposed method is different as it applies an L2-constraint on the feature descriptors enforcing them to lie on a hypersphere of a given radius.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up