Detectron2のチュートリアルを解説してみた

Last updated at 2024-04-20Posted at 2024-04-17

Detectron2とは？

Detectron2はFacebook AI Researchの次世代ライブラリで、最先端の検出とセグメンテーションアルゴリズムを提供しています．コンピュータビジョンのための高性能な物体検出とセグメンテーションのためのフレームワークであるDetectronとmaskrcnn-benchmarkの後継にあたります．Facebookにおけるコンピュータビジョンの研究プロジェクトやプロダクションアプリケーションの多くをサポートしています．(公式githubより)

detectron2に関するドキュメントも用意されているので，こちらも参考にしながら学習を進めていきましょう．

インストール

detectron2を自分のフォルダにインストールするには，ターミナル上で以下のコマンドを実行します．

インストール方法

git@github.com:facebookresearch/detectron2.git
python -m pip install -e detectron2

そうすることで，detectron2の公式のリポジトリが自分のフォルダにクローンされて利用できるようになります．

detectron2のチュートリアルをVScode上で動かしてみる

detectron2の公式githubにdetectron2の基本的な動作が学べるチュートリアルがGoogleColabで提供されていたので実際に動かしてみました．
最初の二つのセルは環境構築なので各自で実装をお願いします．

Run a pre-trained detectron2 model

パッケージやモジュールの準備

# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
import matplotlib.pyplot as plt
# from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

ここでは，必要なパッケージやモジュールなどを全てインストールしています．
matplotlibとPILのimport記述を自分で後から足しています(自作関数に必要になる)．
google.colab.patchesはgoogle colabでのみ使用されるPython モジュールです．
VSCode上では使用することはできない？ので，その代わりに自分で画像を表示できる関数定義をこのセルの下に新規セルを作成して，以下のように記述しました．

画像を表示するための自作関数

def imshow(img):
    plt.imshow(img[:, :, [2, 1, 0]])
    plt.axis("off")
    plt.show()

plt.axis("off")にすることで，画像を表示する際に座標が表示されなくなります．

画像の用意

!wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O input.jpg
im = cv2.imread("./input.jpg")
imshow(im)

wget コマンド：指定されたURLから画像ファイルをダウンロードします．
-q オプション：ダウンロードの進捗やログを表示しないようにします．
-O オプション：ダウンロードされた画像ファイルの保存先を指定しています．この場合は１つ上のレベルのディレクトリにinput.jpgという名前で保存します．

configファイルの設定と推論の実行

cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(im)

ここでは，detectron2のconfigの設定と，それを用いて推論の結果を変数に代入するところまでの実行を行っています．
get_cfg()の定義へ移動すると，_Cというものをクローンしてreturnされていることがわかります．
- _Cの中身を見ると，色々な設定が書かれており，これがdefaultとして使用されdetectron2のさまざまなタスクに対するベースのconfigファイルになっていることがわかります．
merge_from_file() メソッドを使用して，タスクに合わせたconfigファイルをベースのconfigファイルに上書きしています．
- マージするconfigファイルは，自分のタスクに合わせてパスを書き換える必要があります．
- この際，新しいconfigのキーをマージしようとする際には，直前に'set_new_allowed(is_new_allowed)'という一文を追加しないと，うまくマージされない時があります．
その後，DefaultPredictor クラスを使用して predictor オブジェクトを作成し，そのpredictor を使用して画像 im に対する推論を実行しています．
print(outputs)を実行すると，検出された物体に関する情報がたくさん見ることができます．

インスタンスパラメータの可視化

# look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification
print(outputs["instances"].pred_classes)
print(outputs["instances"].pred_boxes)

このコードでは，推論結果を指定して，print()をすることで可視化することができます．
それぞれのインスタンスパラメータの内容は以下のサイトで確認することができます．

推論結果の可視化

# We can use `Visualizer` to draw the predictions on the image.
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
imshow(out.get_image()[:, :, ::-1])

可視化はutils/visualizer.pyのVisualizerクラスを使用しています．このクラスの中身を見てみると，さまざまな可視化関数が記述されているので，自分のタスクにあった関数をここから一つ選んで使用しています．

それぞれの関数が何を可視化するかは，上記のサイトに詳しく書かれているのでそちらを参照してください．

Train on a custom dataset

ここからは，自作したデータを用いて学習を行い可視化を行なっています．
事前学習モデルを使用する際の説明と同じ部分は割愛して，データセットの登録と学習とAP metricを使用しているセルについて説明をしていくことにします．

データセットの登録

# if your dataset is in COCO format, this cell can be replaced by the following three lines:
# from detectron2.data.datasets import register_coco_instances
# register_coco_instances("my_dataset_train", {}, "json_annotation_train.json", "path/to/image/dir")
# register_coco_instances("my_dataset_val", {}, "json_annotation_val.json", "path/to/image/dir")

from detectron2.structures import BoxMode

def get_balloon_dicts(img_dir):
    json_file = os.path.join(img_dir, "via_region_data.json")
    with open(json_file) as f:
        imgs_anns = json.load(f)

    dataset_dicts = []
    for idx, v in enumerate(imgs_anns.values()):
        record = {}

        filename = os.path.join(img_dir, v["filename"])
        height, width = cv2.imread(filename).shape[:2]

        record["file_name"] = filename
        record["image_id"] = idx
        record["height"] = height
        record["width"] = width

        annos = v["regions"]
        objs = []
        for _, anno in annos.items():
            assert not anno["region_attributes"]
            anno = anno["shape_attributes"]
            px = anno["all_points_x"]
            py = anno["all_points_y"]
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            obj = {
                "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": [poly],
                "category_id": 0,
            }
            objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts
for d in ["train", "val"]:
    DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
    MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"])
balloon_metadata = MetadataCatalog.get("balloon_train")

ここでは，簡単に説明すると，自作のデータセットをdetectron2で使用できるような形にしています．
- detectron2では，物体検出やセグメンテーションタスクのために，COCO形式のデータセットを使用しています．
COCO形式のデータセットは辞書型のような形式をとっているため（多分jsonだった気がしますが厳密に確認はまだできてません），ここでも自作のballonデータセットを辞書型に変形する操作が行われています．そのため，ここで定義されている関数の返却値も辞書型になっているはずです．

Train

from detectron2.engine import DefaultTrainer

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("balloon_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2  # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # The "RoIHead batch size". 128 is faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

事前学習モデルでは，学習を行う必要はありませんでしたが，今回は自作のデータセットにモデルを適応させるために，学習を行う必要があります．
そのため，cfg.DATASETS.TRAIN = ("balloon_train",)で学習に使用するデータセットを自作関数の学習用のデータセットに指定しています．
その後，DefaultTrainer クラスを使用して trainer オブジェクトを作成し、そのtrainer を使用して学習を行なっています．

AP metric

AP（Average Precision）メトリックは物体検出の評価指標の一つです．

from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
evaluator = COCOEvaluator("balloon_val", output_dir="./output")
val_loader = build_detection_test_loader(cfg, "balloon_val")
print(inference_on_dataset(predictor.model, val_loader, evaluator))
# another equivalent way to evaluate the model is to use `trainer.test`

まず最初に評価を行う関数などを定義したCOCO Evaluatorクラスのインスタンスを作成します．
その後build_detection_test_loader関数を使用して構築された評価用データローダーを使用して，モデルの推論を行います．具体的には，inference_on_dataset関数を使用します．

Other types of builtin models

ここのセクションでは，指定するyamlファイルを変更して，他のタスクでのビルドインモデルを使用しています．
具体的には次のタスクを実行させています．

キーポイント検出（姿勢推定）
パノプティックセグメンテーション

このように指定するyamlファイルを変えるだけで，チュートリアルレベルでは簡単にいろいろなタスクを実行することができます．

まとめ

今回は色々なコンピュータビジョンタスクを扱うdetectron2のチュートリアルについて解説してみました．
使い方次第では，detectron2は便利なライブラリとして活用することができます．
少しでも私の記事が皆さんのお役に立てると嬉しいです．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up