とりあえずやってみよう（リアルタイムでの動作検出）

Last updated at 2025-10-07Posted at 2025-08-20

はじめに

最近「動作分析」って言葉をよく聞くけど、正直なところ…何から始めればいいの？って感じでした。
そんな自分が、ちょっとだけ手を動かしてみた記録です。
難しいことは置いといて、「とりあえずやってみた」っていうノリで書いてます。
同じように「気になってるけど手を出せてない」って人の参考になれば嬉しいです！
とりあえず今回は手の検出から行い、最終的には全身の検出までやっていこうと思います。
（コードは2025/05辺りに作成したものになります。）

準備

Pythonに下記のライブラリはインストールしておきましょう。

今回必要なライブラリのインストール

pip install cv2

pip install mediapipe

実施内容

下記の流れで進めていきました。

手の検出の実施。

手と顔の検出の実施。

全身の検出の検出の実施。

コード全文

とりあえず最初に手、顔含む全身の検出を行ったコード全文を載せておきますで、「全文あればそれでいい！！！」という方などはここをクリックすると全文出てきます。

import cv2
import mediapipe as mp
import sys

# MediaPipeの初期化
mp_face_mesh = mp.solutions.face_mesh
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.pose

face_mesh = mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, min_detection_confidence=0.5)
hands = mp_hands.Hands(min_detection_confidence=0.5, min_tracking_confidence=0.5)
pose = mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5)

# カメラを開く
cap = cv2.VideoCapture(0)

if not cap.isOpened():
    print("エラー: カメラを開けませんでした。")
    sys.exit(1)

try:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        # 顔メッシュ検出
        face_results = face_mesh.process(rgb_frame)
        if face_results.multi_face_landmarks:
            for face_landmarks in face_results.multi_face_landmarks:
                for landmark in face_landmarks.landmark:
                    x = int(landmark.x * frame.shape[1])
                    y = int(landmark.y * frame.shape[0])
                    cv2.circle(frame, (x, y), 1, (0, 0, 255), -1)

        # 手検出
        hand_results = hands.process(rgb_frame)
        if hand_results.multi_hand_landmarks:
            for hand_landmarks in hand_results.multi_hand_landmarks:
                for landmark in hand_landmarks.landmark:
                    x = int(landmark.x * frame.shape[1])
                    y = int(landmark.y * frame.shape[0])
                    cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)

        # 全身姿勢検出
        pose_results = pose.process(rgb_frame)
        if pose_results.pose_landmarks:
            for landmark in pose_results.pose_landmarks.landmark:
                x = int(landmark.x * frame.shape[1])
                y = int(landmark.y * frame.shape[0])
                cv2.circle(frame, (x, y), 4, (255, 255, 0), -1)

        # 表示
        frame_resized = cv2.resize(frame, (800, 600))
        cv2.imshow('Camera Analysis Only', frame_resized)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

except Exception as e:
    print(f"エラー: {e}")
    sys.exit(1)

finally:
    cap.release()
    cv2.destroyAllWindows()
    face_mesh.close()
    hands.close()
    pose.close()

ではここからは手の検出からやっていきますよおお

手の検出

まずは手の部分（Hand Tracking）を行っていきます。

        hand_results = hands.process(rgb_frame)
        if hand_results.multi_hand_landmarks:
            for hand_landmarks in hand_results.multi_hand_landmarks:
                for landmark in hand_landmarks.landmark:
                    x = int(landmark.x * frame.shape[1])
                    y = int(landmark.y * frame.shape[0])
                    cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)

手と顔の検出

次に顔の検出（Face Mesh）も併せて行います。

   face_results = face_mesh.process(rgb_frame)
    if face_results.multi_face_landmarks:
        for face_landmarks in face_results.multi_face_landmarks:
            for landmark in face_landmarks.landmark:
                x = int(landmark.x * frame.shape[1])
                y = int(landmark.y * frame.shape[0])
                cv2.circle(frame, (x, y), 1, (0, 0, 255), -1)

全身の検出

では最後に全身（Holistic）を行っていきます。

        pose_results = pose.process(rgb_frame)
        if pose_results.pose_landmarks:
            for landmark in pose_results.pose_landmarks.landmark:
                x = int(landmark.x * frame.shape[1])
                y = int(landmark.y * frame.shape[0])
                cv2.circle(frame, (x, y), 4, (255, 255, 0), -1)

まとめと次のステップ

今回の成果

改めて全文になります

import cv2
import mediapipe as mp
import sys

# MediaPipeの初期化
mp_face_mesh = mp.solutions.face_mesh
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.pose

face_mesh = mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, min_detection_confidence=0.5)
hands = mp_hands.Hands(min_detection_confidence=0.5, min_tracking_confidence=0.5)
pose = mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5)

# カメラを開く
cap = cv2.VideoCapture(0)

if not cap.isOpened():
    print("エラー: カメラを開けませんでした。")
    sys.exit(1)

try:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        # 顔メッシュ検出
        face_results = face_mesh.process(rgb_frame)
        if face_results.multi_face_landmarks:
            for face_landmarks in face_results.multi_face_landmarks:
                for landmark in face_landmarks.landmark:
                    x = int(landmark.x * frame.shape[1])
                    y = int(landmark.y * frame.shape[0])
                    cv2.circle(frame, (x, y), 1, (0, 0, 255), -1)

        # 手検出
        hand_results = hands.process(rgb_frame)
        if hand_results.multi_hand_landmarks:
            for hand_landmarks in hand_results.multi_hand_landmarks:
                for landmark in hand_landmarks.landmark:
                    x = int(landmark.x * frame.shape[1])
                    y = int(landmark.y * frame.shape[0])
                    cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)

        # 全身姿勢検出
        pose_results = pose.process(rgb_frame)
        if pose_results.pose_landmarks:
            for landmark in pose_results.pose_landmarks.landmark:
                x = int(landmark.x * frame.shape[1])
                y = int(landmark.y * frame.shape[0])
                cv2.circle(frame, (x, y), 4, (255, 255, 0), -1)

        # 表示
        frame_resized = cv2.resize(frame, (800, 600))
        cv2.imshow('Camera Analysis Only', frame_resized)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

except Exception as e:
    print(f"エラー: {e}")
    sys.exit(1)

finally:
    cap.release()
    cv2.destroyAllWindows()
    face_mesh.close()
    hands.close()
    pose.close()

三段階に分けて行いました。
（分ける必要があったのか、とは自分でも思います、）
"何を行いたいか"によってどのモジュールを使用するかが変わってくると思うので必要なものを取捨選択してください。
他にもモジュールはあるので気になった方はぜひ調べてみてください。

今後試したいこと
Pose Estimationを使っての姿勢分析。
リアルタイム分析ではなく画像や動画に描画する。
リアルタイム分析している様子を録画する。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up