猫と邦ロックで学ぶ画像認識技術(日記)

Last updated at 2025-04-19Posted at 2025-04-19

はじめに

邦ロックと画像認識技術の応用として、好きなバンドの「猫ジャケット」アルバムを対象に、YOLOv5を使って猫を自動で検出してみました。
今回は、きのこ帝国、3markets[ ]、スピッツ、カネコアヤノといったアーティストのアルバムジャケット画像を用い、物体認識の仕組みを試しています。

このプロジェクトで使ったYOLOv5は、「画像の中に何があるか」を一度にすばやく見つけることができるAIモデルです。特に今回は、ジャケット画像に写っている「猫」を自動的に見つけて囲む処理を行いました。

どうやって猫を見つけているのか？

畳み込みニューラルネットワーク（CNN: Convolutional Neural Network）
これは「画像から特徴を見つける装置」です。
たとえば、猫の耳の形や目の位置など、写真の中のパターンを見つけ出して、そこから「これは猫っぽい！」と判断します。
バウンディングボックス回帰（位置を特定する）
AIは猫を見つけただけでなく、「どこにいるのか」も予測します。
そのために、「左上から右下までの範囲（座標）」を計算して、四角い枠（バウンディングボックス）で囲みます。
クラス分類（それが何かを当てる）
見つけたものが「猫なのか」「人なのか」「椅子なのか」を、あらかじめ学習しておいたデータを元に判定します。
前処理（画像の調整）
AIが正しく判断できるように、画像サイズを一定にしたり、明るさや色の情報を標準化したりします（リサイズ・正規化など）。
非最大抑制（かぶりを減らす）
同じ猫を何回も検出してしまうことがあるので、「一番それっぽい」ものだけを残して、他は消すという処理も行います。

参考

# ===============================================
# プログラム名 / Program Name: yolov5_detect_dogs_cats_people.py
# 概要: YOLOv5を用いて画像中の犬・猫・人を検出し、バウンディングボックス付きで表示し数をカウントする
# Purpose: Detect and count 'dog', 'cat', and 'person' in an uploaded image using YOLOv5 with bounding boxes
# ===============================================

# --- YOLOv5のセットアップ / Setup YOLOv5 repository ---
!git clone https://github.com/ultralytics/yolov5.git  # GitHubからYOLOv5のリポジトリをクローン / Clone YOLOv5 repo
%cd yolov5
!pip install -r requirements.txt  # 依存パッケージのインストール / Install necessary dependencies

# --- 画像アップロード（Colab用）/ Upload image using Colab widget ---
from google.colab import files
uploaded = files.upload()  # ユーザーに画像をアップロードさせる / Prompt user to upload image

# --- ファイル名の取得 / Get the uploaded file name ---
import os
img_path = list(uploaded.keys())[0]

# --- モデルのロード（高精度なyolov5xを使用）/ Load pretrained YOLOv5x model (high accuracy) ---
import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5x', pretrained=True)

# --- 推論の実行 / Perform inference on the uploaded image ---
results = model(img_path)

# --- pandas形式で検出結果を取得 / Get detection results as a pandas DataFrame ---
detected = results.pandas().xyxy[0]  # 各行は一つの物体検出を表す / Each row is a detected object

# --- 犬・猫・人のみ抽出 / Filter detections for dog, cat, person ---
filtered = detected[detected['name'].isin(['dog', 'cat', 'person'])]

# --- バウンディングボックス付き画像を描画 / Render bounding boxes on image ---
results.render()  # 検出結果が results.ims[0] に保存される / Rendered image is saved in results.ims[0]

# --- 結果画像の表示 / Display the rendered image with detections ---
from IPython.display import Image, display
import PIL.Image
display(PIL.Image.fromarray(results.ims[0]))

# --- 犬・猫・人の数をカウント / Count number of dogs, cats, and people ---
num_dogs = (filtered['name'] == 'dog').sum()
num_cats = (filtered['name'] == 'cat').sum()
num_persons = (filtered['name'] == 'person').sum()

# --- 検出数の出力 / Print detection results ---
print(f"🐶 犬の数 (Number of dogs): {num_dogs}")
print(f"🐱 猫の数 (Number of cats): {num_cats}")
print(f"🧍‍♀️ 人の数 (Number of people): {num_persons}")

# -----------------------------
#  技術的背景 / Technical Background:
# -----------------------------
# YOLO (You Only Look Once) はリアルタイム物体検出アルゴリズムであり、画像を1回だけCNNに通して全ての物体を同時に検出します。
# YOLO is a real-time object detection algorithm that detects all objects in a single CNN pass.

# yolov5x はYOLOv5の中で最も高精度なモデルであり、大きなモデルサイズと計算量を持つ代わりに高い認識性能を提供します。
# yolov5x is the most accurate model in the YOLOv5 family, offering high performance at the cost of model size and compute.

# -----------------------------
#  数学的背景 / Mathematical Concepts:
# -----------------------------
# ■ 畳み込み演算 (Convolution Operation):
#   入力画像とカーネルの内積をとることで特徴を抽出する。CNNの基本構成要素。
#   Used to extract spatial features by computing inner products between kernel and local image regions.

# ■ バウンディングボックス (Bounding Box):
#   [x_min, y_min, x_max, y_max] の形式で、矩形領域を定義。
#   Rectangular area defined by min and max coordinates.

# ■ IoU (Intersection over Union):
#   NMS（Non-Max Suppression）で使用。2つのボックスの重なり度を評価。
#   Measures overlap between predicted and ground-truth boxes to remove redundant detections.

# ■ 信頼度スコア (Confidence Score):
#   モデルがそのオブジェクトであると判断した確率（0〜1）。
#   Probability estimate of detection correctness.

# ■ クラス分類 (Classification):
#   各検出対象がどのクラス（犬・猫・人）に属するかをCNNで推定。
#   CNN predicts class label for each detected object.

# ■ DataFrameフィルタリング（pandas）:
#   DataFrame内の特定クラス名で行を抽出。
#   Select rows by filtering the "name" column in pandas DataFrame.

使う画像

結果

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up