More than 5 years have passed since last update.

[DIGITS] DetectNetをmulti class で学習させる．

Last updated at 2018-02-06Posted at 2017-11-08

この記事では NVIDIA の開発した検出ネットワークである DetectNet を使って，自前のデータで複数物体検出を行う方法を紹介する．学習フレームワークはCaffe(ver. 0.16.2) のラッパであるDIGITS (ver. 5.1) を使用する．

それも detection の精度を上げるために fine tuning で二段階の学習を行うこととする．どういうことかというと，一度目の学習では 1 クラスの検出ネットワークで学習し，この学習済みモデルをマルチクラス用に fine tuning する．
例えば，classes = {tanaka, nakajima, takahashi} をそれぞれ顔検出したいとすると，classes = {face} で一度学習した後に，classes = {tanaka, nakajima, takahashi} で fine tuning することになる．

1 クラス学習

データ準備

train
|- images
|  |- image0001.png
|  |- ...
|- labels
   |- image001.txt
   |- ...
validate
|- images
|  |- image1001.png
|  |- ...
|- labels
   |- image101.txt
   |- ...

という形で準備しよう．image001.txtなどの中身は公式
(https://github.com/NVIDIA/DIGITS/blob/digits-4.0/digits/extensions/data/objectDetection/README.md#custom-class-mappings) を参考にしよう．この段階では，txtの中身は 1 クラスだけ，つまり

face 0.00 0 -10 496.00 222.00 599.00 328.00 -1 -1 -1 -1000 -1000 -1000 -10

という形式になる．

データセット作成

注意点

Pad image どの元画像サイズよりも大きくなくてはならない．
Resize image は1248x384にする．変更する場合は後に.prototxtを改変しなければならない．
Custom classes は dontcare,some_label というカンマ区切りの形式にする．（参考：
https://github.com/NVIDIA/DIGITS/blob/digits-4.0/digits/extensions/data/objectDetection/README.md#custom-class-mappings)

学習

注意点

select DataSet ではさっき作ったデータセット (例ではface_dataset) を指定しよう．
epoch は多めに取ろう．(20 epoch くらいまで mAP = Precision = 0.0ってこともある）
batch size は小さめに取ろう．小さすぎて学習が進まないことはそうそう無いみたい．大きいとむしろメモリーが足りなくなる．
subtract mean はPixcel がおすすめ．imageでやった場合，実用時に対象画像から平均値画像を引いてネットワーク入力にするとめちゃくちゃな精度になったので…（原因はよくわかってません．）
Custom network にはここ (https://github.com/NVIDIA/caffe/blob/caffe-0.15/examples/kitti/detectnet_network.prototxt) からとってきたテキストをコピペする．
この段階では Pretrained networkには何もいれない．

学習開始

ちゃんと精度でたかな〜

よさそう．

ダメだった場合：

custom class に間違えた名前入れてないかを確認．
learning rate = {1e-3, 1e-4, 1e-5, 1e-6, 1e-7} くらいまでは根気よく試そう
epoch 増やそう．学習画像枚数が少ないほど更新の回数も少ないのでepochを上げる必要がある．

multi クラス学習

データ準備

先ほどと同様に準備する．今回はマルチクラスなので，image001.txtの中身は，

takahashi 0.00 0 -10 496.00 222.00 599.00 328.00 -1 -1 -1 -1000 -1000 -1000 -10
nakajima 0.00 0 -10 291.00 138.00 327.00 212.00 -1 -1 -1 -1000 -1000 -1000 -10

という形式になる．

データセット作成

注意点

custom classes がマルチクラスになる．
それ以外は 1 クラスの時と同様の注意を払う．

学習

前半の学習設定は 1 クラスの時の注意点と同じで，今回注目すべきはネットワークの定義設定．
先ほど学習したネットワークを選択して，epochも指定して，coustomize をクリックする．

1 クラスの時と同じネットワーク定義の下の Pretrained modelsの項目に学習済みネットワークのパスが自動で指定されました！
しかし，このまま実行しても 1 クラス用のネットワークで学習することになるので，マルチクラスで学習できません．そこで，ネットワーク構造の定義を変更します．

今テキスト上で書かれているネットワーク定義はここ
(https://github.com/NVIDIA/caffe/blob/caffe-0.15/examples/kitti/detectnet_network.prototxt) からとってきたはずなのですが，実は同じgit上のディレクトリに 2 クラス用のネットワーク定義ファイルがあります (https://github.com/NVIDIA/caffe/blob/caffe-0.15/examples/kitti/detectnet_network-2classes.prototxt) ．

1 クラス用のネットワーク定義ファイルと 2 クラス用のネットワーク定義ファイルファイルの差分を見れば， 3 クラス以上のネットワーク定義ファイルも書けちゃうわけですね．

linuxのdiffコマンドで差分をとってみましょう．

82c82,83
<     object_class: { src: 1 dst: 0} # obj class 1 -> cvg index 0
---
>     object_class: { src: 1 dst: 0} # cars -> 0
>     object_class: { src: 8 dst: 1} # pedestrians -> 1
121c122,123
<     object_class: { src: 1 dst: 0} # obj class 1 -> cvg index 0
---
>     object_class: { src: 1 dst: 0} # cars -> 0
>     object_class: { src: 8 dst: 1} # pedestrians -> 1
2386c2388
<     num_output: 1
---
>     num_output: 2
2500c2502,2503
<     top: 'bbox-list'
---
>     top: 'bbox-list-class0'
>     top: 'bbox-list-class1'
2504c2507
<         param_str : '1248, 352, 16, 0.6, 3, 0.02, 22, 1'
---
>         param_str : '1248, 352, 16, 0.6, 3, 0.02, 22, 2'
2515c2518,2519
<   top: 'bbox-list-label'
---
>   top: 'bbox-list-label-class0'
>   top: 'bbox-list-label-class1'
2519c2523
<       param_str : '1248, 352, 16, 1'
---
>       param_str : '1248, 352, 16, 2'
2525,2528c2529,2532
<     name: 'score'
<     bottom: 'bbox-list-label'
<     bottom: 'bbox-list'
<     top: 'bbox-list-scored'
---
>     name: 'score-class0'
>     bottom: 'bbox-list-label-class0'
>     bottom: 'bbox-list-class0'
>     top: 'bbox-list-scored-class0'
2537,2541c2541,2572
<     name: 'mAP'
<     bottom: 'bbox-list-scored'
<     top: 'mAP'
<     top: 'precision'
<     top: 'recall'
---
>     name: 'mAP-class0'
>     bottom: 'bbox-list-scored-class0'
>     top: 'mAP-class0'
>     top: 'precision-class0'
>     top: 'recall-class0'
>     python_param {
>         module: 'caffe.layers.detectnet.mean_ap'
>         layer: 'mAP'
>         param_str : '1248, 352, 16'
>     }
>     include: { phase: TEST stage: "val" }
> }
> 
> layer {
>     type: 'Python'
>     name: 'score-class1'
>     bottom: 'bbox-list-label-class1'
>     bottom: 'bbox-list-class1'
>     top: 'bbox-list-scored-class1'
>     python_param {
>         module: 'caffe.layers.detectnet.mean_ap'
>         layer: 'ScoreDetections'
>     }
>     include: { phase: TEST stage: "val" }
> }
> layer {
>     type: 'Python'
>     name: 'mAP-class1'
>     bottom: 'bbox-list-scored-class1'
>     top: 'mAP-class1'
>     top: 'precision-class1'
>     top: 'recall-class1'

変更点は…9箇所ですね．

せっかくなので変更点をまとめてみました．

"train_trainsform"層と "val_transform"層

object_class: {src: 1 dst: 0}

を

object_class: {src: 1 dst: 0}
object_class: {src: 2 dst: 1}
object_class: {src: 3 dst: 2}

に変更する． src: 0 はdontcareなので，文字通りきにしない．

"cvg/classifier"層

num_output: 1

を

num_output: 3

に変更する．また，

name: "cvg/clasifier"

を

name: "cvg/classifier_for_3classes"

に変更する．変えないと，学習済みグラフと重みを共有しようとして次元の食い違いでerror 吐きます．

"cluster"層

    top: 'bbox-list'

を

    top: 'bbox-list-class0'
    top: 'bbox-list-class1'
    top: 'bbox-list-class2'

に変更する．

      param_str : '1248, 352, 16, 0.6, 3, 0.02, 22, 1'

を

      param_str : '1248, 352, 16, 0.6, 3, 0.02, 22, 3'

に変更する．

"cluster_gt"層

  top: 'bbox-list-label'

を

  top: 'bbox-list-label-class0'
  top: 'bbox-list-label-class1'
  top: 'bbox-list-label-class2'

に変更する．

      param_str : '1248, 352, 16, 1'

を

      param_str : '1248, 352, 16, 3'

に変更する．

"score"層，"mAP"層

これらは，name, bottom, top の値の方にそれぞれ "-class0", "-class1", "-class2"をつけくわえたものを追加すればオーケー．

こんな感じ

layer {
    type: 'Python'
    name: 'score-class0'
    bottom: 'bbox-list-label-class0'
    bottom: 'bbox-list-class0'
    top: 'bbox-list-scored-class0'
    python_param {
        module: 'caffe.layers.detectnet.mean_ap'
        layer: 'ScoreDetections'
    }
    include: { phase: TEST stage: "val" }
}
layer {
    type: 'Python'
    name: 'mAP-class0'
    bottom: 'bbox-list-scored-class0'
    top: 'mAP-class0'
    top: 'precision-class0'
    top: 'recall-class0'
    python_param {
        module: 'caffe.layers.detectnet.mean_ap'
        layer: 'mAP'
        param_str : '1248, 352, 16'
    }
    include: { phase: TEST stage: "val" }
}

layer {
    type: 'Python'
    name: 'score-class1'
    bottom: 'bbox-list-label-class1'
    bottom: 'bbox-list-class1'
    top: 'bbox-list-scored-class1'
    python_param {
        module: 'caffe.layers.detectnet.mean_ap'
        layer: 'ScoreDetections'
    }
    include: { phase: TEST stage: "val" }
}
layer {
    type: 'Python'
    name: 'mAP-class1'
    bottom: 'bbox-list-scored-class1'
    top: 'mAP-class1'
    top: 'precision-class1'
    top: 'recall-class1'
    python_param {
        module: 'caffe.layers.detectnet.mean_ap'
        layer: 'mAP'
        param_str : '1248, 352, 16'
    }
    include: { phase: TEST stage: "val" }
}

layer {
    type: 'Python'
    name: 'score-class2'
    bottom: 'bbox-list-label-class2'
    bottom: 'bbox-list-class2'
    top: 'bbox-list-scored-class2'
    python_param {
        module: 'caffe.layers.detectnet.mean_ap'
        layer: 'ScoreDetections'
    }
    include: { phase: TEST stage: "val" }
}

layer {
    type: 'Python'
    name: 'mAP-class2'
    bottom: 'bbox-list-scored-class2'
    top: 'mAP-class2'
    top: 'precision-class2'
    top: 'recall-class2'
    python_param {
        module: 'caffe.layers.detectnet.mean_ap'
        layer: 'mAP'
        param_str : '1248, 352, 16'
    }
    include: { phase: TEST stage: "val" }
}

長かった．．．お疲れ様です．

いざ，学習開始！

こんな感じにmetric が「うわー」っと出ます．
あとは楽しみに待ちましょう．

最後に

さて，うまく学習はできましたでしょうか．
なにかコメントございましたら歓迎です．
最後に，DIGITSの仕様変更で今回の情報はすぐ古いものになりうるので，十分注意してください．

DIGITS-5.1
Caffe-0.16.2

2017/11/9

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up