【Detectron2を掘り下げる】学習中にval datasetに対する評価指標を計算・ロギングする

Posted at 2024-04-03

Detectron2は物体検出やセグメンテーションをする上で色々と便利な機能が簡単に実装できるライブラリです。
ただ、使ってみるとモデルを構築する上で提供されていないものがチラホラあるので、掘り下げて実装してみました。

前回は検証用（val）データセットに対してのlossを計算、ログを取る機能を実装しました。
今回はvalデータセットに対してAverage Precisionなどのその他の評価指標を計算、ログを取る機能を実装しました。また、データセットのフォーマットはCOCO形式のものを使用しています。

前回の記事
【Detectron2を掘り下げる】val lossを計算・ロギングする

環境

Python 3.10.13
CUDA 11.7
pytorch==1.13.1
detectron2==0.6

実装

1. configを設定する

from detectron2.config import get_cfg
 
cfg = get_cfg()
cfg.DATASETS.TEST = ("your_validation_dataset",)
cfg.TEST.EVAL_PERIOD = 100

この例では100回のイテレーション後に1回、検証用データセットに対して評価が実施されます。
TEST.EVAL_PERIODを設定すると、検証用データセット全体を使用してevaluatorが呼び出され、結果がストレージに書き込まれます。

2. Evaluatorを実装する

以下の公式githubのURLのbuild_evaluatorを丸パクリ実装します。

trainerにbuild_evaluatorを定義してください。

class Trainer(DefaultTrainer):
    ...

    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
        """
        Create evaluator(s) for a given dataset.
        This uses the special metadata "evaluator_type" associated with each builtin dataset.
        For your own dataset, you can simply create an evaluator manually in your
        script and do not have to worry about the hacky if-else logic here.
        """
        if output_folder is None:
            output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
        evaluator_list = []
        evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
        if evaluator_type in ["sem_seg", "coco_panoptic_seg"]:
            evaluator_list.append(
                SemSegEvaluator(
                    dataset_name,
                    distributed=True,
                    output_dir=output_folder,
                )
            )
        if evaluator_type in ["coco", "coco_panoptic_seg"]:
            evaluator_list.append(COCOEvaluator(dataset_name, output_dir=output_folder))
        if evaluator_type == "coco_panoptic_seg":
            evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder))
        if evaluator_type == "cityscapes_instance":
            return CityscapesInstanceEvaluator(dataset_name)
        if evaluator_type == "cityscapes_sem_seg":
            return CityscapesSemSegEvaluator(dataset_name)
        elif evaluator_type == "pascal_voc":
            return PascalVOCDetectionEvaluator(dataset_name)
        elif evaluator_type == "lvis":
            return LVISEvaluator(dataset_name, output_dir=output_folder)
        if len(evaluator_list) == 0:
            raise NotImplementedError(
                "no Evaluator for the dataset {} with the type {}".format(
                    dataset_name, evaluator_type
                )
            )
        elif len(evaluator_list) == 1:
            return evaluator_list[0]
        return DatasetEvaluators(evaluator_list)

学習する

学習が進み、cfg.TEST.EVAL_PERIODで設定したイテレーションごとにvalデータセットに対して以下のように評価指標が計算されます。

Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.847
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.851
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.880
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.880
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.880
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.880
[11/14 19:05:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50   |  AP75   |  APs  |  APm  |  APl   |
|:------:|:-------:|:-------:|:-----:|:-----:|:------:|
| 84.719 | 100.000 | 100.000 |  nan  |  nan  | 85.059 |
[11/14 19:05:02 d2.evaluation.coco_evaluation]: Some metrics cannot be computed and is shown as NaN.
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.908
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.908
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.920
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.920
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.920
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920
[11/14 19:05:02 d2.evaluation.coco_evaluation]: Evaluation results for segm: 
|   AP   |  AP50   |  AP75   |  APs  |  APm  |  APl   |
|:------:|:-------:|:-------:|:-----:|:-----:|:------:|
| 90.759 | 100.000 | 100.000 |  nan  |  nan  | 90.759 |
[11/14 19:05:02 d2.evaluation.coco_evaluation]: Some metrics cannot be computed and is shown as NaN.
[11/14 19:05:02 d2.engine.defaults]: Evaluation results for balloon_val in csv format:
[11/14 19:05:02 d2.evaluation.testing]: copypaste: Task: bbox
[11/14 19:05:02 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[11/14 19:05:02 d2.evaluation.testing]: copypaste: 84.7195,100.0000,100.0000,nan,nan,85.0589
[11/14 19:05:02 d2.evaluation.testing]: copypaste: Task: segm
[11/14 19:05:02 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[11/14 19:05:02 d2.evaluation.testing]: copypaste: 90.7591,100.0000,100.0000,nan,nan,90.7591

以下はMLflowでログを取った結果です。
【Detectron2を掘り下げる】MLflowで実験管理

こんな感じでサクッと実装することができます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up