More than 1 year has passed since last update.

DETR（End-to-End Object Detection with Transformers）物体検知（自前データでのファインチューニング）

Last updated at 2024-04-12Posted at 2024-01-08

この動画で使っているプログラムを、こちらの記事に転記しています。

記事の概要

自前で用意した画像データに対して、ラベリング（labelme）・ラベルの確認（coco-viewer）の手順を追加し、自前データを用いたファインチューニングの手順を記載しています。

以下のリンクの記事では、オープンソースの学習データでの手順を紹介しています。

環境

OS:Windows 11
GPU:GeForce RTX 4090
CPU:i9-13900KF
memory:64G
python:3.10.10
pytorch:2.0.1
CUDA:11.8
cuDNN:8.8

環境構築

コマンドプロンプトを起動し、以下のコマンドを実行し、環境構築を構築する。

# 以下のフォルダ内で全ての作業を行う。
mkdir detr_own_data
cd detr_own_data

# 仮想環境を構築・アクティベイト
python -m venv detr_own_data_env
cd detr_own_data_env\Scripts
activate
cd ..

# 対象のライブラリをインストール
git clone https://github.com/EscVM/OIDv4_ToolKit.git
pip install urllib3==1.25.11 folium==0.2.1
pip install -r OIDv4_ToolKit/requirements.txt

データの作成方針

物体検知のファインチューニングには、教師データとして、1000件～5000件が必要なため、1種類の検知対象につき、1000件の画像を準備する。今回はStable Diffusionにて、画像を作成する。

※　この記事で利用しているデータは上記からダウンロード可能です。(miharu.zip/nemuru.zip)
※　画像データとアノテーションデータが含まれています。
※　この記事で説明しているラベリングが完了していて「教師データ・アノテーションデータの加工」以降が対応していない状態です。
※　合計3.5G以上あるので、利用する場合は注意してください。
※　無断で他の用途で使うことはご遠慮ください。

また、データの増幅に行うことにより、作業効率を行うため、Stable Diffusionで用意した教師データ500件の画像を反転し、1種類について1000件の教師データを実現する。

画像データを格納

「detr_own_data」フォルダ何に、「image_data」フォルダを作成する。
※「labelme」フォルダはこの後作成しますが、「detr_own_data」フォルダは以下のような構成になっているはずです。

■detr_own_dataフォルダ内

「image_data」フォルダ内に、検知対象ごとに、フォルダを作成し、画像データを格納する。
今回の場合は、「miharu」と「nemuru」という人物を別の人物と検知するモデルを作成するため、それぞれのフォルダを作成し、画像データを格納しています。

■image_dataフォルダ内

■miharuフォルダ

■nemuruフォルダ

labelmeの導入

画像データに対して、ラベリング（アノテーションデータ作成）を行う。
ラベリングを行うために、labelmeというオープンソースのツールを利用する。
別のコマンドプロンプトを起動し、以下のコマンドで、labelmeの環境を構築する。

# 「detr_own_data」フォルダをコマンドプロンプトで起動（別のコマンド）
# cd XXXXX\detr_own_data(detr_own_dataに移動)

# labelmeの環境を構築
python -m venv labelme_env
labelme_env\Scripts\activate
pip install labelme
labelme

※「labelme」コマンド実行時に、以下のエラーが出る場合は、「python -m venv labelme_env」を「全角の文字列を含まないパス」のフォルダで実行してください。

コマンドが正常に実行できていれば、以下のようにlabelmeが起動するはずです。

labelmeの設定

「Open Dir」を押下し、画像データを格納したフォルダを指定する。
（今回の場合は「miharu」フォルダ）

「File＞Change Ouput Dir」で出力先のフォルダを指定する。
今回は、事前に画像データフォルダ内に、「annotation」フォルダを作成し、そのフォルダを指定する。
（これにより、再度のlabelmeを開く時に、ラベリング済みのファイルがわかる。）

Save Automaticallyでラベリング後のjsonファイルを自動で保存されるように設定する。

ここまで設定できていれば、以下のような状態になっている。

labelmeの操作（ラベリング）

右クリックを押下し、「Create Rectangle」を押下する。

以下の画像の左上の緑点の位置で、左クリックし、右下の緑点の位置で左クリックすると、以下のように四角の枠が出現する。これがアノテーションデータとなる。

以下のようなウィンドウが表示されるので、ラベル名（今回の場合は「miharu」）と記載し、「OK」を押下する。

「Label List」にラベル名が追加されるはず。

上記の作業をラベル（今回は「miharu」と「nemuru」）ごとに用意した画像ごとに実施する。

教師データ・アノテーションデータの加工

「image_data」フォルダでコマンドプロンプトを起動し、以下のコマンドを実行する。
このコマンドにより、全ラベルの画像データとアノテーションデータを同じフォルダにコピーする。

cd XXXX\detr_own_data\image_data
mkdir all_data
xcopy miharu\* all_data\ /E /I
xcopy nemuru\* all_data\ /E /I

「image_data」フォルダで以下のプログラムを実行し、labelmeデータを増幅する。（左右反転したデータを作成する。）

import json
import cv2
import os
import glob
import shutil

def flip_annotation(input_json_path, output_json_path, flipped_image_filename):
    with open(input_json_path, 'r') as f:
        data = json.load(f)
    
    img_width = data['imageWidth']
    
    for shape in data['shapes']:
        for point in shape['points']:
            point[0] = img_width - point[0]
    
    data['imagePath'] = flipped_image_filename
    
    with open(output_json_path, 'w') as f:
        json.dump(data, f)

def flip_image(input_image_path, output_image_path):
    img = cv2.imread(input_image_path)
    flipped_img = cv2.flip(img, 1)
    cv2.imwrite(output_image_path, flipped_img)

def clear_directory(dir_path):
    for file_name in os.listdir(dir_path):
        file_path = os.path.join(dir_path, file_name)
        if os.path.isfile(file_path):
            os.unlink(file_path)

def process_files(input_image_dir, input_json_dir, output_image_dir, output_json_dir):
    image_files = glob.glob(os.path.join(input_image_dir, "*.png"))
    json_files = glob.glob(os.path.join(input_json_dir, "*.json"))
    
    # Ensure output directories exist and are empty
    os.makedirs(output_image_dir, exist_ok=True)
    os.makedirs(output_json_dir, exist_ok=True)
    clear_directory(output_image_dir)
    clear_directory(output_json_dir)
    
    for image_file, json_file in zip(sorted(image_files), sorted(json_files)):
        output_image_file = os.path.join(output_image_dir, "flipped_" + os.path.basename(image_file))
        output_json_file = os.path.join(output_json_dir, "flipped_" + os.path.basename(json_file))
        
        flip_image(image_file, output_image_file)
        flipped_image_filename = os.path.basename(output_image_file)
        flip_annotation(json_file, output_json_file, flipped_image_filename)
        
    # Move flipped files to input directories
    for output_image_file, output_json_file in zip(glob.glob(os.path.join(output_image_dir, "*.png")), glob.glob(os.path.join(output_json_dir, "*.json"))):
        shutil.move(output_image_file, input_image_dir)
        shutil.move(output_json_file, input_json_dir)

# Example usage:
# Directories containing the image and annotation files
input_image_dir = '.\\all_data\\annotation'
input_json_dir = '.\\all_data\\output'

# Output directories
output_image_dir = '.\\all_data\\output_images'
output_json_dir = '.\\all_data\\annotation\\path_to_output_annotations'

process_files(input_image_dir, input_json_dir, output_image_dir, output_json_dir)

「image_data」フォルダで以下のプログラムを実行し、labelmeフォーマットからCOCOフォーマットに変換する。
（「categories」にラベル名を入力する必要があるので、今回は「miharu」「nemuru」を登録する。

import os
import json
import cv2
import random
from pathlib import Path

def generate_coco_format(json_files, start_image_id=1, start_annotation_id=1):
    coco_format = {
        "images": [],
        "annotations": [],
        "categories": []
    }
    
    image_id = start_image_id
    annotation_id = start_annotation_id
    
    for json_file in json_files:
        with open(json_file) as f:
            data = json.load(f)
        
        # 画像情報の追加
        image_info = {
            "file_name": data['imagePath'].replace('..\\', ''),
            "height": data['imageHeight'],
            "width": data['imageWidth'],
            "id": image_id
        }
        coco_format['images'].append(image_info)
        
        # アノテーション情報の追加
        for shape in data['shapes']:
            label = shape['label']
            points = shape['points']
            category_id = categories[label]
            
            # ポリゴンの座標をCOCOフォーマットに変換
            segmentation = [list(sum(points, []))]
            
            # バウンディングボックスの計算
            xs = [point[0] for point in points]
            ys = [point[1] for point in points]
            min_x = min(xs)
            min_y = min(ys)
            width = max(xs) - min_x
            height = max(ys) - min_y
            
            annotation_info = {
                "id": annotation_id,
                "image_id": image_id,
                "category_id": category_id,
                "segmentation": segmentation,
                "bbox": [min_x, min_y, width, height],
                "area": width * height,
                "iscrowd": 0,
            }
            coco_format['annotations'].append(annotation_info)
            annotation_id += 1
        
        image_id += 1
    
    return coco_format

input_dir = '.\\all_data\\annotation'  # labelmeのアノテーションが保存されたディレクトリ
output_dir = '.\\all_data\\output'  # 出力するCOCOフォーマットのJSONファイルのパス
train_file = os.path.join(output_dir, 'custom_train.json')
val_file = os.path.join(output_dir, 'custom_val.json')

categories = {
    "miharu": 1,
    "nemuru": 2,
    # 他のカテゴリを追加
}
coco_format_base = {
    "categories": [{'id': id, 'name': name} for name, id in categories.items()]
}

all_json_files = list(Path(input_dir).glob('*.json'))
random.shuffle(all_json_files)

split = int(0.98 * len(all_json_files))
train_files = all_json_files[:split]
val_files = all_json_files[split:]

coco_train = generate_coco_format(train_files)
coco_train["categories"] = coco_format_base["categories"]
with open(train_file, 'w') as f:
    json.dump(coco_train, f)

coco_val = generate_coco_format(val_files, start_image_id=len(train_files) + 1, start_annotation_id=len(coco_train['annotations']) + 1)
coco_val["categories"] = coco_format_base["categories"]
with open(val_file, 'w') as f:
    json.dump(coco_val, f)

cocoviewer

labelmeでのアノテーション作成→データ増幅→フォーマット変換の過程がうまくいっているかを確認するために、cocoviewerを使って確認する。

cocoviewerの導入

detr_own_dataフォルダでコマンドプロンプトを起動し、以下のコマンドでcocoviewerの環境を構築する。

# 「detr_own_data」フォルダをコマンドプロンプトで起動（別のコマンド）
# cd XXXXX\detr_own_data(detr_own_dataに移動)

# 以下のコマンドでcocoviewerをインストールする。
git clone https://github.com/trsvchn/coco-viewer
cd coco-viewer

# Pillowが動かない場合はダウングレード
# pip uninstall Pillow
# pip install Pillow==9.5.0

以下のコマンドでcocoviewerを起動する。

python cocoviewer.py -i "..\\image_data\\all_data" -a "..\\image_data\\all_data\\output\\custom_train.json"

以下の画面が起動したら設定できている。

「→」キーなどを入力すると、他の画像のアノテーションが付与されていることが確認できる。
データ増幅のパターンは想定通りラベリングできていることが確認できる。

※（何件か、アノテーションがずれているので、バグっているところがありそうです。。。）

DETRのファインチューニング

コマンドプロンプトを起動し、以下のコマンドで環境を構築する。

# 「detr_own_data」フォルダをコマンドプロンプトで起動（別のコマンド）
# cd XXXXX\detr_own_data(detr_own_dataに移動)

# 最初に構築した仮想環境をアクティベイト
cd detr_own_data_env\Scripts
activate
cd ..

pip install torch torchvision torchtext -f https://download.pytorch.org/whl/cu118/torch_stable.html
pip install torch torchvision torchtext
pip install matplotlib
pip install pycocotools
pip install scipy

# detr_own_data_envフォルダ配下で以下のコマンドを実行
rd /s detr
git clone https://github.com/woctezuma/detr.git

# ブランチの切替
cd detr/
git checkout finetune

以下のプログラムを実行し、学習済みモデルを保存する。

import torch, torchvision
import torchvision.transforms as T
import matplotlib.pyplot as plt
from PIL import Image
import requests

# 学習済みモデルの取得
checkpoint = torch.hub.load_state_dict_from_url(
    url='https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth',
    map_location='cpu',
    check_hash=True
)

# 分類ヘッドの削除
del checkpoint['model']['class_embed.weight']
del checkpoint['model']['class_embed.bias']

# 保存
torch.save(checkpoint, 'detr-r50_no-class-head.pth')

画像データとアノテーションデータを移動

# 必要なフォルダを作成する
mkdir ..\data
mkdir ..\data\custom
mkdir ..\data\custom\annotations
mkdir ..\data\custom\train2017
mkdir ..\data\custom\val2017

# JSONデータをコピーする
xcopy /E /I ..\..\image_data\all_data\output ..\data\custom\annotations

# PNGデータをコピーするコマンド
xcopy /E /I ..\..\image_data\all_data\*.png ..\data\custom\train
xcopy /E /I ..\..\image_data\all_data\*.png ..\data\custom\val

以下のコマンドで、ファインチューニング後のモデルの出力先フォルダを作成する。

rd /s outputs
mkdir outputs

以下のコマンドでファインチューニングを実施する。
GeForce RTX4090で15時間かかります。

python main.py --dataset_file "custom" --coco_path "..\\data\\custom\\" --output_dir "outputs" --resume "detr-r50_no-class-head.pth" --num_classes 3 --epochs 200

以下のプログラムを実行し、学習結果を確認する。

import torch, torchvision
import torchvision.transforms as T
import matplotlib.pyplot as plt
# from util.plot_utils import plot_logs
from pathlib import Path
from io import BytesIO
from PIL import Image
import requests

log_directory = [Path('\\outputs')]

# 実線 ... トレーニング結果(train_loss)
# 破線 ... 検証結果(val_loss)
fields_of_interest = (
    'loss',
    'mAP',
)
# plot_logs(log_directory, fields_of_interest)

finetuned_model = torch.hub.load('facebookresearch/detr',
                       'detr_resnet50',
                       pretrained=False,
                       num_classes=3)
checkpoint = torch.load('.\\outputs\\checkpoint.pth',
                        map_location='cpu')
finetuned_model.load_state_dict(checkpoint['model'], strict=False)
finetuned_model.eval()

original_model = torch.hub.load('facebookresearch/detr', 'detr_resnet50_dc5', pretrained=True)
original_model.eval()

# 可視化用クラスラベル
oid_labels = [
  'N/A',
  'miharu',
  'nemuru',
]
coco_labels = [
    'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack',
    'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A',
    'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
    'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
    'toothbrush'
]
# 可視化用COLOR
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

# 標準的なPyTorchのmean-std入力画像の正規化
transform = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

def box_cxcywh_to_xyxy(x):
    """
    (center_x, center_y, width, height)から(xmin, ymin, xmax, ymax)に座標変換
    """
    # unbind(1)でTensor次元を削除
    # (center_x, center_y, width, height)*N → (center_x*N, center_y*N, width*N, height*N)
    x_c, y_c, w, h = x.unbind(1)
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h), (x_c + 0.5 * w), (y_c + 0.5 * h)]
    # (center_x, center_y, width, height)*N の形に戻す
    return torch.stack(b, dim=1)

def rescale_bboxes(out_bbox, size):
    """
    バウンディングボックスのリスケール
    """
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    # バウンディングボックスの[0～1]から元画像の大きさにリスケール
    b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
    return b

def filter_bboxes_from_outputs(outputs, threshold=0.7):
    # 閾値以上の信頼度を持つ予測値のみを保持
    probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
    keep = probas.max(-1).values > threshold
    probas_to_keep = probas[keep]
    # [0, 1]のボックスを画像のスケールに変換
    bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
    return probas_to_keep, bboxes_scaled

# 結果の表示
def plot_finetuned_results(pil_img, prob=None, boxes=None, labels=None):
    plt.figure(figsize=(16, 10))
    plt.imshow(pil_img)
    ax = plt.gca()
    colors = COLORS * 100
    if prob is not None and boxes is not None:
        for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), colors):
            ax.add_patch(plt.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin,
                                                                 fill=False, color=c, linewidth=3))
            cl = p.argmax()
            print(labels, p)
            text = f'{labels[cl]}: {p[cl]:0.2f}'
            ax.text(xmin, ymin, text, fontsize=15,
                            bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')
    plt.show()

# 物体検出
def run_worflow(my_image, my_model, labels, threshold=0.7):
    # mean-std入力画像の正規化(バッチサイズ : 1)
    img = transform(my_image).unsqueeze(0)
    
    # モデルに反映
    outputs = my_model(img)
    probas_to_keep, bboxes_scaled = filter_bboxes_from_outputs(outputs, threshold=threshold)
    plot_finetuned_results(my_image, probas_to_keep, bboxes_scaled, labels)

im = Image.open("..\\..\\miharu-01.png").convert("RGB")
# im = Image.open("..\\..\\miharu-02.png").convert("RGB")
# im = Image.open("..\\..\\miharu-03.png").convert("RGB")
# im = Image.open("..\\..\\nemuru-01.png").convert("RGB")
# im = Image.open("..\\..\\nemuru-02.png").convert("RGB")
# im = Image.open("..\\..\\nemuru-03.png").convert("RGB")

run_worflow(im, finetuned_model, oid_labels, 0.9)

上記のプログラムを実行すると、以下のようにバウンディングボックスが出た状態で出力されるはずです。

検証用の画像については、以下に配置しています。
（miharu-01.png/miharu-02.png/miharu-03.png/nemuru-01.png/nemuru-02.png/nemuru-03.png/イラスト.png）
※　無断で他の用途で使うことはご遠慮ください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up