More than 3 years have passed since last update.

ハンドジェスチャーでMusic(iTunes)を操作した。

Posted at 2020-05-19

はじめに

Qiitaの記事を書くのが初めてなので、間違えているところなどあると思います。
また、Pythonに関してもKerasなどの機械学習ライブラリで遊ぶ程度の初心者なのでコードをもっと良く書ける部分があると思います。
「ここをこうした方がいいよ」などコメントでアドバイスを頂けると嬉しいです。

今回作成したもの

タイトルにもある通りPythonからMusic(iTunes)をコントロールできるプログラムを作成しました。

(わかりにくくてごめんなさい。)

手の認識とジェスチャー

手の認識はmetalwhaleさんのGithubのリポジトリを参考に、wolterlwさんのプログラムを使わせていただきました。
説明を見たところ、認識にはGoogleのmediapipeをPythonで扱えるようにした物らしいです。
（手の検出器は顔などと違い全然ないことにびっくりしました...）

使用するには以下のモジュールとhand_tracker.pyが必要なのでインストールをしてください。

numpy
opencv
tensorflow

ジェスチャーはTheJLifeXさんがC言語で作成した手のランドマークの位置でジェスチャーを判断するプログラムを参考に両手と上下の判断をできるようにpythonで書きました。

landmark.py

def finger(landmark):
    x = 0
    y = 1
    thumbFinger = False
    firstFinger = False
    secondFinger = False
    thirdFinger = False
    fourthFinger = False

    if landmark[9][y] < landmark[0][y]:
        Hand_direction_y = 'up'
    else:
        Hand_direction_y = 'down'

    landmark_point = landmark[2][x]
    if landmark[5][x] < landmark[17][x]:
        if landmark[3][x] < landmark_point and landmark[4][x] < landmark_point:
            thumbFinger = True
        Hand_direction_x = 'right'
    else:
        if landmark[3][x] > landmark_point and landmark[4][x] > landmark_point:
            thumbFinger = True
        Hand_direction_x = 'left'

    landmark_point = landmark[6][y]
    if landmark[7][y] < landmark_point and landmark[8][y] < landmark_point:
        firstFinger = True

    landmark_point = landmark[10][y]
    if landmark[11][y] < landmark_point and landmark[12][y] < landmark_point:
        secondFinger = True

    landmark_point = landmark[14][y]
    if landmark[15][y] < landmark_point and landmark[16][y] < landmark_point:
        thirdFinger = True

    landmark_point = landmark[18][y]
    if landmark[19][y] < landmark_point and landmark[20][y] < landmark_point:
        fourthFinger = True

    if thumbFinger and firstFinger and secondFinger and thirdFinger and fourthFinger:
        hand = 'five'
    elif not thumbFinger and firstFinger and secondFinger and thirdFinger and fourthFinger:
        hand = 'four'
    elif not thumbFinger and firstFinger and secondFinger and thirdFinger and not fourthFinger:
        hand = 'tree'
    elif not thumbFinger and firstFinger and secondFinger and not thirdFinger and not fourthFinger:
        hand = 'two'
    elif not thumbFinger and firstFinger and not secondFinger and not thirdFinger and not fourthFinger:
        hand = 'one'
    elif not thumbFinger and not firstFinger and not secondFinger and not thirdFinger and not fourthFinger:
        hand = 'zero'
    elif thumbFinger and not firstFinger and not secondFinger and not thirdFinger and fourthFinger:
        hand = 'aloha'
    elif not thumbFinger and firstFinger and not secondFinger and not thirdFinger and fourthFinger:
        hand = 'fox'
    elif thumbFinger and firstFinger and not secondFinger and not thirdFinger and not fourthFinger:
        hand = 'up'
    elif thumbFinger and firstFinger and not secondFinger and not thirdFinger and fourthFinger:
        hand = 'RankaLee'
    else:
        hand = None

    return hand, Hand_direction_x, Hand_direction_y

Musicの制御

Musicの制御には、自分が作成したControl-itunes-from-pythonを使用しました。
このプログラムはsubprocessからOsascriptを実行しているだけのプログラムです。

(GithubのReadmeにもある通りmacOS 10.15 Catalina以上でしか実行できません。）

作成したコード

今回自分が作成したコードです。
ここでは、映像を取得後、手の認識やジェスシャーなどを行いその結果でMusicの操作を行うコード実行しています。

main.py

# coding=utf-8
import cv2
import numpy as np
from hand_tracker import HandTracker
from control import Control,Get_data
from landmark import finger
import subprocess
import shlex

hand_detector = HandTracker("./files/palm_detection_without_custom_op.tflite","./files/hand_landmark.tflite","./files/anchors.csv")
cap = cv2.VideoCapture(0)
before = None

def music(hand, Hand_direction_x, Hand_direction_y, before):
    if hand is not before:
        if hand is 'two' and Hand_direction_x == 'right' and Hand_direction_y == 'up':
            Control.next_track()
            return 'Next track'
        elif hand is 'two' and Hand_direction_x == 'left' and Hand_direction_y == 'up':
            Control.back_track()
            return 'Back track'
        elif hand is 'RankaLee' and Hand_direction_x == 'left' and Hand_direction_y == 'up':
            Control.previous_track()
            return 'Previous track'
        elif hand is 'aloha' and Hand_direction_x == 'right' and Hand_direction_y == 'up':
            Control.playpause()
            return 'Playpause'
        elif hand is 'fox' and Hand_direction_x == 'right' and Hand_direction_y == 'up':
            display = f'osascript -e \'display notification \"{Get_data.current_track_artist()}\" with title \"{Get_data.current_track()}\"\''
            subprocess.Popen(shlex.split(display),stdout=subprocess.PIPE)
            return 'display'

while True:
    ret, frame = cap.read()
    if ret:
        frame = cv2.flip(frame, 1)
        landmark, _ = hand_detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        if landmark is not None:
            hand, Hand_direction_x, Hand_direction_y = finger(landmark)
            control = music(hand, Hand_direction_x, Hand_direction_y, before)
            before = hand
            for landmark_point in landmark:
                x, y = landmark_point
                cv2.circle(frame, (int(x), int(y)), 2, (255, 0, 0), 4)
        cv2.imshow('Hand', frame)
        if cv2.waitKey(10) == 27:
            cap.release()
            cv2.destroyAllWindows()
            break

10行目あたりのhand_detector = HandTrackerは手の認識に必要なファイルを設定しています。
今回はmetalwhaleさんのhand_trackingリポジトリの中にあるmodelsを使いました。

14行目の def music ではジェスチャーで判断したデータを元にMusicの制御をするプログラムです。
ここを書き換えれば他の物を制御できたりするようになります。

さいごに

予想より、手の認識やランドマークが正確でびっくりしました。
今回は音楽の制御に使いましたが、別のことにも活用できると思うので、色々と面白いことをしてコメントなどに書いていただけると嬉しいです！！

今回作成したコードはGithubの方にあげたので、よければ使ってください（hand_tracker.pyとmodelsのダウンロードだけお願いします）

初めてのQittaの記事で全然説明などできていなかったり文字がおかしいところがあると思います。
もしあった場合は編集リクエストやコメントなどで教えてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up