Python で物体認識AIのYOLOv11を使ってオリジナル画像を学習してみた！〜アノテーションから学習処理、推論までを詳しく紹介！〜

Last updated at 2024-12-22Posted at 2024-12-22

はじめに

今回は、YOLOv11を使って、オリジナル画像を学習して、独自の物体認識モデルを作成する方法を、ご紹介したいと思います。
今回、学習した物体は、我が家の猫のダーちゃんです。
150枚の画像を集めて、アノテーションをしたうえで、学習処理を実行しました。
意外と簡単にオリジナル画像を認識できましたので、ぜひ、ご覧ください！

環境

Windows10
Python3.11.8
GPU:NVIDIA Geforce RTX3060
VSCode

使用ライブラリー

ultralytics 8.3.11
labelImg 1.8.6

#YOLOv11のインストール:
githubの以下のページに記載の流れでインストールします。
前提条件としてはPython3.8以上でPyTorchは1.8以上が必要です。
PyTorchをインストールした後、以下のpipコマンドでYOLOv11をインストールします。

pip install ultralytics

YouTubeでの解説：

１．学習画像の収集
２．アノテーションツールのLabelimgを使ったアノテーション作業
３．学習処理
４．Webカメラを使った推論
までをYoutubeで詳しく解説していますので、ぜひ、ご覧ください。

#動画から画像を取り出すプログラム
スペースを押す都度、動画ファイルから画像を抜き出します。

video_to_img.py

import cv2
import os

# 動画ファイルのパスを指定
video_path = "input_video.mp4"
output_dir = "output_images"

# 出力先ディレクトリが存在しない場合は作成
os.makedirs(output_dir, exist_ok=True)

# 動画ファイルを読み込み
cap = cv2.VideoCapture(video_path)

if not cap.isOpened():
    print("動画ファイルを開けませんでした。パスを確認してください。")
    exit()

frame_count = 0

while True:
    ret, frame = cap.read()
    
    # フレームの読み取りが終了したらループを抜ける
    if not ret:
        print("動画の再生が終了しました。")
        break
    
    # フレームを表示
    cv2.imshow('Video Frame', frame)
    
    # キー入力を待つ
    key = cv2.waitKey(1) & 0xFF
    
    # スペースキーを押した場合
    if key == ord(' '):
        # フレームを画像として保存
        frame_filename = os.path.join(output_dir, f"frame_{frame_count:04d}.png")
        cv2.imwrite(frame_filename, frame)
        print(f"フレームを保存しました: {frame_filename}")
    
    # 'q'キーで終了
    if key == ord('q'):
        print("終了します。")
        break
    
    frame_count += 1

# リソースを解放
cap.release()
cv2.destroyAllWindows()

アノテーションツール（Labelimg)のインストール

pip install labelimg

Labelimgの起動：コマンドプロンプトから以下を実行

labelimg

学習させた画像とラベルファイルを以下の構成で保存

dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
     ├── train/
     └── val/

学習処理

YouTubeで紹介している学習処理のプログラムソースです。

yolo_train_da.py

from ultralytics import YOLO

if __name__ == "__main__":
    model = YOLO('yolo11s.pt')  # 必要に応じてモデルを変更
    model.train(data='./data.yaml', epochs=64, imgsz=640)

yamlファイル

Youtubeで紹介している、学習用の画像とラベルファイルのパスを指定します。

data.yaml

train: E:\300_Python\120_yolo_train/datasets/images/train
val: E:\300_Python\120_yolo_train/datasets/images/val

nc: 1  # クラス数（アノテーションに含まれるクラス数）
names: ['da']  # クラス名

推論処理

Webカメラを使って推論を実行します。

yolo_test_video.py

import cv2
from ultralytics import YOLO

# Load the YOLO model
model = YOLO("best.pt")

# Open the video file
# video_path = "cat.mp4"
video_path = 0
cap = cv2.VideoCapture(video_path)

# Get the original video width, height, and frame rate
original_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
original_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)

# Define the output video size (1080p)
output_height = 1080
output_width = int((output_height / original_height) * original_width)

# Define the codec and create VideoWriter object to save the output video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec for MP4
out = cv2.VideoWriter('out2.mp4', fourcc, fps, (output_width, output_height))

# Loop through the video frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Resize the frame to 1080p
        resized_frame = cv2.resize(frame, (output_width, output_height))

        # Run YOLO inference on the resized frame
        results = model(resized_frame, conf=0.8,iou=0.3)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Write the annotated frame to the output video
        out.write(annotated_frame)

        # Display the annotated frame
        cv2.imshow("YOLO Inference", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture and writer objects and close the display window
cap.release()
out.release()
cv2.destroyAllWindows()

最後に：

今回は、物体認識のYOLOv11を使った、オリジナル画像の学習方法について、ご紹介しました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up