More than 3 years have passed since last update.

TensorFlow Object Detection API で物体検出モデルをトレーニング

Last updated at 2021-01-02Posted at 2021-01-02

手順

TensorFlow Object Detection APIで物体検出モデルをがっつりトレーニングする方法です。

初期値からの学習も
転移学習もできます。

簡易なファインチューニング学習のみする場合はこちらの記事：
TensorFlow Object Detection APIで物体検出モデルを簡易トレーニング

推論のみする場合はこちらの記事：
TensorFlow Object Detection API のつかいかた（推論。Colabサンプル付き）
を参考にしてください。

１、TensorFlow Object Detection APIと必要なライブラリをインストール

git clone https://github.com/tensorflow/models.git


cd models/research
# Compile protos.
protoc object_detection/protos/*.proto --python_out=.
# Install TensorFlow Object Detection API.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

２、トレーニングデータ、ラベルマップ、検証データ、を用意

参考）独自のデータセットをTFRecord 形式にする
以下のディレクトリ構造で用意します。

.
├── data/
│   ├── eval-00000-of-00001.tfrecord # 検証データ 
│   ├── label_map.txt # ラベルマップ
│   ├── train-00000-of-00002.tfrecord # トレーニングデータ
│   └── train-00001-of-00002.tfrecord # トレーニングデータ
└── models/
    └── my_model_dir/
        ├── eval/                 # 検証によって生成される
        ├── my_model.config
        └── model_ckpt-100-data@1 # トレーニングによって生成される
        └── model_ckpt-100-index  # トレーニングによって生成される
        └── checkpoint            # トレーニングによって生成される

・用意するもの１　

ラベルマップ：label_map.pbtxt

ラベルIDとラベル名のテキストを関係付けます。

pet_label_map.pbtxt


item {
  id: 1
  name: 'Abyssinian'
}

item {
  id: 2
  name: 'american_bulldog'
}

item {
  id: 3
  name: 'american_pit_bull_terrier'
}

リポジトリにサンプルのラベルマップが入っているので、コピーして書き換えると必要なフォーマットで書きやすいです。

cp object_detection/data/pet_label_map.pbtxt data/my_label_map.pbtxt

・用意するもの２

設定：my_model.config

トレーニングの構成を設定するファイルです。
モデルの構造やインプット、トレーニング設定などここに記述、追加、変更されたものがトレーニングに反映されます。

リポジトリの object_detection/config にモデルごとの config ファイルがあるので自分用に書き換えます。

pipeline.config


🐥最低限書き換える場所にコメント入れてます

model {
  ssd {
    num_classes: 90 🐥自分のデータセットのクラス数に書き換える
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }
    feature_extractor {
      type: "ssd_resnet50_v1_fpn_keras"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 0.00039999998989515007
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.029999999329447746
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.996999979019165
          scale: true
          epsilon: 0.0010000000474974513
        }
      }
      override_base_feature_extractor_hyperparams: true
      fpn {
        min_level: 3
        max_level: 7
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 0.00039999998989515007
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.009999999776482582
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.996999979019165
            scale: true
            epsilon: 0.0010000000474974513
          }
        }
        depth: 256
        num_layers_before_predictor: 4
        kernel_size: 3
        class_prediction_bias_init: -4.599999904632568
      }
    }
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 2
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 9.99999993922529e-09
        iou_threshold: 0.6000000238418579
        max_detections_per_class: 100
        max_total_detections: 100
        use_static_shapes: false
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 64 🐥実行環境のメモリが大きくない場合は少なくする
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.03999999910593033
          total_steps: 25000
          warmup_learning_rate: 0.013333000242710114
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.8999999761581421
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "my_model_dir/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/saved_model/checkpoint/ckpt-01" 🐥事前トレーニング済みモデルをつかってファインチューンする場合は、チェックポイントのパスに書き換える
  num_steps: 25000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "classification" 🐥'detection'に書き換える
  use_bfloat16: true
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" 🐥ラベルマップpbtxtのパスに書き換える
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/train_dataset.record-?????-of-0010.tfrecord" 🐥学習用データのtfrecodのパスに書き換える
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path:  "PATH_TO_BE_CONFIGURED/label_map.txt" 🐥ラベルマップpbtxtのパスに書き換える
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/eval_dataset.record-?????-of-0010.tfrecord" 🐥検証用データのtfrecodのパスに書き換える
  }
}

バッチサイズが大きいと、メモリが足りなくてクラッシュします。Colab環境など、メモリが大きくない場合は、バッチサイズを小さくします。
チェックポイントは

checkpoint/
├── checkpoint
├── ckpt-0.data-00000-of-00001
├── ckpt-0.index

という３ファイル構成のチェックポイントの場合は、

PATH/checkpoint/ckpt-0

model.ckpt という単一ファイルの場合は、

PATH/model.ckpt

と指定します。

・用意するもの３

トレーニング済みモデル

モデルをいちからトレーニングすることもできますが、数日かかることもあるので、
TensorFlowlow Model Zoo で事前トレーニング済みモデルをダウンロードしてファインチューントレーニングすることができます。
以下でダウンロードして解凍します。
ダウンロードパスはモデルズーのリンクにあります。

wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
tar -xf ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz

解凍したファイルにチェックポイントとConfigファイルが入っています。
事前トレーニング済みモデルを使う場合は、この設定ファイルを書き換えて使うと、各種パラーメーターがモデルにあったものになります。

トレーニング開始


!python object_detection/model_main_tf2.py \
    --pipeline_config_path="my_model_dir/pipline.config" \
    --model_dir="./my_model_dir" \
    --alsologtostderr

学習開始できたらこういうのが出ます。

NFO:tensorflow:Step 100 per-step time 0.211s loss=35.350
I0102 05:04:25.884553 140388036892544 model_lib_v2.py:651] Step 100 per-step time 0.211s loss=35.350
INFO:tensorflow:Step 200 per-step time 0.218s loss=36.062
I0102 05:04:46.017316 140388036892544 model_lib_v2.py:651] Step 200 per-step time 0.218s loss=36.062
INFO:tensorflow:Step 300 per-step time 0.203s loss=35.008
I0102 05:05:06.347388 140388036892544 model_lib_v2.py:651] Step 300 per-step time 0.203s loss=35.008
INFO:tensorflow:Step 400 per-step time 0.219s loss=35.200

評価

実行引数のチェックポイントパスを指定すると、評価モードで実行されます。


python object_detection/model_main_tf2.py \
    --pipeline_config_path="my_model_dir/pipeline.config" \
    --model_dir="/content/models/my_model_dir" \
    --checkpoint_dir="/content/models/my_model_dir" \ #トレーニングで生成されたチェックポイントのパス
    --alsologtostderr

モデルの保存

トレーニング実行中に、１０００ステップごとにチェックポイント・ディレクトリに保存されます。

checkpoint/
├── checkpoint
├── ckpt-01.data-00000-of-00001
├── ckpt-01.index
├── ckpt-02.data-00000-of-00001
├── ckpt-02.index

チェックポイントの保存頻度や更新頻度（デフォルトでは７セット溜まったら古いものから更新されていく）は model_main_tf2.py の引数で調整できます。後でリストアするためには、checkpointディレクトリごと保存しておきましょう。

トレーニングしたモデルの復元

モジュールのインストール

import matplotlib
import matplotlib.pyplot as plt
import os
import io
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont

import tensorflow as tf

from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder

%matplotlib inline

トレーニングの時と同じ設定ファイル pipeline.config をモデルビルダーに与えて、モデル構造を構築し、トレーニング済みのチェックポイントから重みをモデルにリストアします。


pipeline_config = "my_model_dir/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/pipeline.config"
# チェックポイントのパス 
model_dir = "my_model_dir"

# モデル構成情報読み込み
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']

# 読み込んだ構成情報でモデルをビルド
detection_model = model_builder.build(
      model_config=model_config, is_training=False)

# チェックポイントから重みを復元
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore(os.path.join(model_dir, 'ckpt-20')).expect_partial() #チェックポイントのファイルの番号を指定

トレーニングしたモデルで新たな画像を推論

推論関数を準備
画像を与えて、モデルが推論して結果を返す関数を用意します


def get_model_detection_function(model):
  """Get a tf.function for detection."""

  @tf.function
  def detect_fn(image):
    """Detect objects in image."""

    image, shapes = model.preprocess(image)
    prediction_dict = model.predict(image, shapes)
    detections = model.postprocess(prediction_dict, shapes)

    return detections, prediction_dict, tf.reshape(shapes, [-1])

  return detect_fn

detect_fn = get_model_detection_function(detection_model)

ラベルIDとラベルテキストを関連づけるディクショナリを用意します
トレーニングで使ったラベルマップファイルを使って用意します。


label_map_path = 'data/label_map.pbtxt'
label_map = label_map_util.load_labelmap(label_map_path)
categories = label_map_util.convert_label_map_to_categories(
    label_map,
    max_num_classes=label_map_util.get_max_label_map_index(label_map),
    use_display_name=True)
category_index = label_map_util.create_category_index(categories)
label_map_dict = label_map_util.get_label_map_dict(label_map, use_display_name=True)

画像を　Numpy Array　にする関数を用意


def load_image_into_numpy_array(path):
  """画像ファイルをNumpy配列にする.

    TensorFlowのグラフに食わせるために画像をNumpy配列に。
  慣例として（高さ、幅、カラーチャネル）形状のNumpy配列にする。

  引数:
    path: 画像ファイルのパス.

  戻り値:
    uint8、(高さ, 幅, ３チャネル)形状のnumpy配列。 
  """
  img_data = tf.io.gfile.GFile(path, 'rb').read()
  image = Image.open(BytesIO(img_data))
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def get_keypoint_tuples(eval_config):
  """Return a tuple list of keypoint edges from the eval config.
  
  Args:
    eval_config: an eval config containing the keypoint edges
  
  Returns:
    a list of edge tuples, each in the format (start, end)
  """
  tuple_list = []
  kp_list = eval_config.keypoint_edge
  for edge in kp_list:
    tuple_list.append((edge.start, edge.end))
  return tuple_list

推論


image_dir = 'test_images'
image_path = os.path.join(image_dir, 'test_0000.jpg')
image_np = load_image_into_numpy_array(image_path)

# Things to try:
# Flip horizontally
# image_np = np.fliplr(image_np).copy()

# Convert image to grayscale
# image_np = np.tile(
#     np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)

input_tensor = tf.convert_to_tensor(
    np.expand_dims(image_np, 0), dtype=tf.float32)
detections, predictions_dict, shapes = detect_fn(input_tensor)

label_id_offset = 1
image_np_with_detections = image_np.copy()

# Use keypoints if available in detections
keypoints, keypoint_scores = None, None
if 'detection_keypoints' in detections:
  keypoints = detections['detection_keypoints'][0].numpy()
  keypoint_scores = detections['detection_keypoint_scores'][0].numpy()

viz_utils.visualize_boxes_and_labels_on_image_array(
      image_np_with_detections,
      detections['detection_boxes'][0].numpy(),
      (detections['detection_classes'][0].numpy() + label_id_offset).astype(int),
      detections['detection_scores'][0].numpy(),
      category_index,
      use_normalized_coordinates=True,
      max_boxes_to_draw=200,
      min_score_thresh=.15,
      agnostic_mode=False,
      keypoints=keypoints,
      keypoint_scores=keypoint_scores,
      keypoint_edges=get_keypoint_tuples(configs['eval_config']))

plt.figure(figsize=(12,16))
plt.imshow(image_np_with_detections)
plt.show()

推論結果（Detections）はそれぞれ１００個のボックス、スコア、ラベルで返され、
スコアが視覚化ツールの引数の0.３を超えているボックスが描画された画像が表示されます。
スコアの閾値は引数で調整できます。

🐣

フリーランスエンジニアです。
お仕事のご相談こちらまで
rockyshikoku@gmail.com

Core MLを使ったアプリを作っています。
機械学習関連の情報を発信しています。

Twitter
Medium

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up