Mediapipeで動画のオブジェクト検出を試してみる

Last updated at 2025-02-03Posted at 2025-02-03

はじめに

今回はMediapipeを使って動画ファイルのオブジェクトを検出してみたいと思い、試してみた結果を備忘録として残します。

事前準備

まず、Googleが準備している学習済みのモデルがオブジェクト検出タスクガイド
にあるので、これを使用します。僕は fficientDet-Lite0 モデル（推奨） のEfficientDet-Lite0（float 32）を使いました。
また、Googleがサンプルコードを準備していますので、これを参考にしました。

実装してみる

Google Colaboratory上でPythonを使って実装しました。
まずは事前準備でDLしたEfficientDet-Lite0（float 32）モデルを指定します。

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

model_path = '/content/drive/MyDrive/efficientdet_lite0.tflite'

import cv2
from mediapipe.tasks.python import vision
from google.colab.patches import cv2_imshow

BaseOptions = mp.tasks.BaseOptions
ObjectDetector = mp.tasks.vision.ObjectDetector
ObjectDetectorOptions = mp.tasks.vision.ObjectDetectorOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = ObjectDetectorOptions(
    base_options=BaseOptions(model_asset_path=model_path),
    max_results=5,
    running_mode=VisionRunningMode.VIDEO)

cap = cv2.VideoCapture('/content/drive/MyDrive/test.MP4')

video_file_fps = cv2.CAP_PROP_FPS
with ObjectDetector.create_from_options(options) as detector:
  ret, frame = cap.read()

  while ret:
    frame_timestamp_ms = cap.get(cv2.CAP_PROP_POS_MSEC)
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
    detection_result = detector.detect_for_video(mp_image, int(frame_timestamp_ms))

    annotated_image = visualize(np.array(frame).astype(np.uint8), detection_result)
    cv2_imshow(annotated_image)

    ret, frame = cap.read()

cap.release()
cv2.destroyAllWindows()

visualize()は、githubのコードをそのまま使いました。

結果

今回使用した動画ファイルは、以前、ETロボコンに参加したときの動画ファイルを使いました。

人（person）と椅子（chair）が検出されていますね。精度は5割程度なので、少し低いのかな。使用しているモデルの精度が低いのと、動画ファイルも解像度が小さいのでまずまず期待通りなのかなと。
もう少し精度高めのモデルを使ったり、使用する動画ファイルの解像度をUpすればさらに精度も上がるんじゃないかと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up