More than 1 year has passed since last update.

MediaPipeを使って手から取得した骨格座標の情報をCSVに保存する

Last updated at 2023-09-13Posted at 2021-12-30

MediaPipe

MediaPipeはGoogleが公開しているクロスプラットフォームで実行可能なMLソリューションです。
顔認証や手の骨格推定などのソリューションを少ないコード記述で利用することができ、動作も軽快です。

※使用するプラットフォームによって利用可能なソリューションは異なります。
詳細は以下の公式ドキュメントをご参照ください。

今回はPythonにて、

MediaPipeを用いて手の画像からランドマークの座標情報を取得
取得した座標情報をCSVに保存
ついでにランドマークを描画した座標を保存

という流れを作成したので備忘録として残しておきます。

インストール

pip installでインストールします。

$ pip install mediapipe

データの用意

今回は./data以下に手の画像を何枚か用意しておきます。

コード

ひとまず先にコード全体を記載します。
参考：MediaPipe Hands

import cv2
import glob
import os
import csv
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands

def fields_name():
    # CSVのヘッダを準備
    fields = []
    fields.append('file_name')
    for i in range(21):
        fields.append(str(i)+'_x')
        fields.append(str(i)+'_y')
        fields.append(str(i)+'_z')
    return fields

if __name__ == '__main__':
    # 元の画像ファイルの保存先を準備
    resource_dir = r'./data'
    # 対象画像の一覧を取得
    file_list = glob.glob(os.path.join(resource_dir, "*.jpg"))

    # 保存先の用意
    save_csv_dir = './result/csv'
    os.makedirs(save_csv_dir, exist_ok=True)
    save_csv_name = 'landmark.csv'
    save_image_dir = 'result/image'
    os.makedirs(save_image_dir, exist_ok=True)

    with mp_hands.Hands(static_image_mode=True,
            max_num_hands=1, # 検出する手の数（最大2まで）
            min_detection_confidence=0.5) as hands, \
        open(os.path.join(save_csv_dir, save_csv_name), 
            'w', encoding='utf-8', newline="") as f:

        # csv writer の用意
        writer = csv.DictWriter(f, fieldnames=fields_name())
        writer.writeheader()

        for file_path in file_list:
            # 画像の読み込み
            image = cv2.imread(file_path)

            # 鏡写しの状態で処理を行うため反転
            image = cv2.flip(image, 1)

            # OpenCVとMediaPipeでRGBの並びが違うため、
            # 処理前に変換しておく。
            # CV2:BGR → MediaPipe:RGB
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image.flags.writeable = False

            # 推論処理
            results = hands.process(image)

            # 前処理の変換を戻しておく。
            image.flags.writeable = True
            image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

            if not results.multi_hand_landmarks:
                # 検出できなかった場合はcontinue
                continue

            # ランドマークの座標情報
            landmarks = results.multi_hand_landmarks[0]
            
            # CSVに書き込み
            record = {}
            record["file_name"] = os.path.basename(file_path)
            for i, landmark in enumerate(landmarks.landmark):
                record[str(i) + '_x'] = landmark.x
                record[str(i) + '_y'] = landmark.y
                record[str(i) + '_z'] = landmark.z
            writer.writerow(record)

            # 元画像上にランドマークを描画
            mp_drawing.draw_landmarks(
                image,
                landmarks,
                mp_hands.HAND_CONNECTIONS,
                mp_drawing_styles.get_default_hand_landmarks_style(),
                mp_drawing_styles.get_default_hand_connections_style())
            # 画像を保存
            cv2.imwrite(
                os.path.join(save_image_dir, os.path.basename(file_path)),
                cv2.flip(image, 1))

あまり書くことがないですが、ポイントだけ。

手の座標検出

with mp_hands.Hands(static_image_mode=True,
        max_num_hands=1, # 検出する手の数（最大2まで）
        min_detection_confidence=0.5) as hands

mp.solutions.hands.Handsにて手のランドマーク推定のインスタンスを生成しておきます。

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image.flags.writeable = False

# 推論処理
results = hands.process(image)

# 前処理の変換を戻しておく。
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

OpenCVとMediaPipeでは色の順番が異なっています。

OpenCV：BGR
MediaPipe：RGB

そのため、MediaPipeでの推論処理前にcv2.cvtColor(image, cv2.COLOR_BGR2RGB)にて色の順番を変換させる必要があります。
※画像のオブジェクトを再利用する場合、反対にcv2.cvtColor(image, cv2.COLOR_RGB2BGR)出戻しておきます。

その後、hands.process(image)にて推定処理を実行します。

CSVへの書き込み

# csv writer の用意
writer = csv.DictWriter(f, fieldnames=fields_name())
writer.writeheader()

# -- 省略 --

# ランドマークの座標情報
landmarks = results.multi_hand_landmarks[0]
            
# CSVに書き込み
record = {}
record["file_name"] = os.path.basename(file_path)
for i, landmark in enumerate(landmarks.landmark):
     record[str(i) + '_x'] = landmark.x
     record[str(i) + '_y'] = landmark.y
     record[str(i) + '_z'] = landmark.z
writer.writerow(record)

取得したランドマークの座標情報をCSVに書き込みします。
座標情報は以下の画像の通り、各関節の位置がindexで定められています。

※公式より引用

landmarksには以下のような形で各ランドマークの座標が設定されています。

...省略...
landmark {
  x: 0.6645678281784058
  y: 0.6872593760490417
  z: -0.061526086181402206
}
landmark {
  x: 0.722145676612854
  y: 0.690102219581604
  z: -0.07218331098556519
}
landmark {
  x: 0.7754498720169067
  y: 0.688714325428009
  z: -0.08117059618234634
}
...省略...

気をつけるポイントとしては、
xとyについては0～1の間で正規化された値になっています。
そのため、実際の画像上の座標に落としたい場合は以下の様に画像サイズ（横幅、縦幅）と掛け合わせて変換する必要があります。

image_height, image_width, _ = image.shape
x = landmark.x * image_width
y = landmark.y * image.height

zについても-1～1の間で正規化されているようです。
※詳細はわからず...

上記でresult/csvの下に次のようなCSVが保存されるかと思われます。

file_name	0_x	0_y	0_z	...	20_x	20_y	20_z
hogehoge.jpg	0.xxxx	0.xxxx	0.xxxx	...	0.xxxx	0.xxxx	0.xxxx

画像の描画

ついでにランドマークを元画像上に描画して保存しています。

mp_drawing.draw_landmarks(
    image,
    landmarks,
    mp_hands.HAND_CONNECTIONS,
    mp_drawing_styles.get_default_hand_landmarks_style(),
    mp_drawing_styles.get_default_hand_connections_style())
cv2.imwrite(
    os.path.join(save_image_dir, os.path.basename(file_path)),
    cv2.flip(image, 1))

座標情報を元に自力でラインを描画することも勿論可能ですが、MediaPipeの中に描画用の関数が用意されているので、簡単に試すことができます。

以下のような画像が作成できます。

おわりに

今回は手の画像に対して推定処理を行いましたが、カメラを繋いで取得した映像からリアルタイムに座標取得が行えるほど軽量に動作します。
手の推定以外にも姿勢制御等もあり、めちゃくちゃ弄るのが楽しいので、おすすめです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up