More than 1 year has passed since last update.

MediaPipeによる手の検出 + StrongSORTを用いた物体追跡

Last updated at 2023-01-31Posted at 2022-07-25

前回の記事では，「OpenCVの顔検出モデルを用いた顔追跡」と「他の物体検出モデルを用いた追跡方法」について説明しました．

この記事では，前回の記事で書いた「他の物体検出モデルを用いた追跡方法」を参考に実装したので紹介します．具体的には，MediaPipe Handsで手の検出を行います．検出された手をStrongSORTを用いて追跡するのが一連の流れとなっています．

今回は以下のGitHubのリポジトリを使用し説明していきます．

環境

Google Colaboratory

実際に動かしてみる

今回使用するGitHubのリポジトリをクローンします．

$ git clone https://github.com/ysenkun/hands-detection-strongsort.git

必要なライブラリをpipでインストールします．

$ pip3 install -r requirements.txt

それでは，実際に動かします．追跡したい手の動画のパスを入力し，実行してください．実行するとoutput.mp4で以下ような動画が出力されます．

$ python3 track.py --source vid.mp4 # video path

動作結果

プログラムの説明

前回の記事で書いた「他の物体検出モデルを用いた追跡方法」に沿って説明していきます．
init関数には，今回使用したMediaPipe Handsの設定を加えます．self.which_handにはクラスを辞書型で格納します．また，max_num_handsには検出できる手の数の最大数を設定しています．

def __init__(self, arg):
    #Hands detection using MediaPipe Hands
    self.mp_hands = mp.solutions.hands
    self.mp_drawing = mp.solutions.drawing_utils
    self.mp_drawing_styles = mp.solutions.drawing_styles
    self.which_hand = {0:'Right',1:'Left'}
    self.max_num_hands = arg.max_hands
    self.min_detection_confidence = arg.min_confidence

any_model関数では，物体検出の処理を行なっています．今回は，骨格推定も行っているため，少しプログラムが複雑になっています．
hands_list = []には，骨格推定によって抽出された値から，手を囲むバウンディングボックスの座標([max_x,min_y,min_x,max_y])を計算し格納しています．この座標をStrongSORT用に[x_center,y_center,width,height]に変換します．また，clssでは，骨格推定によるクラスを数値化し，init関数で設定した辞書型と対応するようにします．
今回は，骨格推定によるアノテーションも含めているため，returnにimage(frame)を加えています．

def any_model(self,frame):
    image = frame
    img_height, img_width, _ = image.shape
    outputs = []
    confs = []

    # Run MediaPipe Hands.
    with self.mp_hands.Hands(
        static_image_mode=True,
        max_num_hands=self.max_num_hands,
        min_detection_confidence=self.min_detection_confidence) as hands:

        # Convert the BGR image to RGB, flip the image around y-axis for correct 
        # handedness output and process it with MediaPipe Hands.
        results = hands.process(cv2.flip(cv2.cvtColor(image, cv2.COLOR_BGR2RGB), 1))

        if results.multi_hand_landmarks:
            score_list = [results.multi_handedness[i].classification[0].score 
                for i in range(len(results.multi_handedness))]
            label_list = [results.multi_handedness[i].classification[0].label 
                for i in range(len(results.multi_handedness))]
            # Print handedness (left v.s. right hand).
            #print(results.multi_handedness)

            annotated_image = cv2.flip(image.copy(), 1)

            hands_list = []
            for hand_landmarks in results.multi_hand_landmarks:
                hand_list = []

                self.mp_drawing.draw_landmarks(
                    annotated_image,
                    hand_landmarks,
                    self.mp_hands.HAND_CONNECTIONS,
                    self.mp_drawing_styles.get_default_hand_landmarks_style(),
                    self.mp_drawing_styles.get_default_hand_connections_style())

                lm = hand_landmarks.landmark[0]
                lm_xlist = [lm.x for lm in hand_landmarks.landmark]
                lm_ylist = [lm.y for lm in hand_landmarks.landmark]
                hand_list.append((1-min(lm_xlist)) * img_width)
                hand_list.append(min(lm_ylist) * img_height)
                hand_list.append((1-max(lm_xlist)) * img_width)
                hand_list.append(max(lm_ylist) * img_height)

                hands_list.append(hand_list)
                
            #Change annotation coordinates for StrongSORT    
            x = torch.tensor(hands_list)
            xywhs = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
            xywhs[:, 0] = (x[:, 0] + x[:, 2]) / 2  # x center
            xywhs[:, 1] = (x[:, 1] + x[:, 3]) / 2  # y center
            xywhs[:, 2] = x[:, 0] - x[:, 2]  # width
            xywhs[:, 3] = x[:, 3] - x[:, 1]  # height

            confs = torch.tensor(score_list)
            clss = [0 if 'Right'==label  else  1 for label in label_list]
            clss = torch.tensor(clss)
            image = cv2.flip(annotated_image, 1)

            #Run StorngSORT
            outputs = self.strongsort.update(xywhs.cpu(), confs.cpu(), clss.cpu(), image)         
    return outputs,confs, image

annotation関数では，クラスに応じてラベル付けする作業を書き換えました．clssにはクラスを数値化した値が入っています．この値とinit関数で設定したself.which_handが対応するようにlabelに格納します．

def annotation(self, frame, output, conf):
    bboxes = output[0:4]
    id = int(output[4])
    clss = int(output[5])
    label = self.which_hand[clss] #Make the object name change to match the clss number

さいごに

以上で，この記事の内容は終了です．

補足ですが，any_model関数の中身をうまく書き換えれば基本動くと思います．
最も重要なことは，検出したバウンディングボックスを[x_center,y_center,width,height]に書き換えてStrongSORTに渡してあげることです．

他の物体検出にも応用が可能なので，皆さんに使ってもらえたら幸いです．
最後まで，読んでくださりありがとうございます．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

MediaPipeによる手の検出 + StrongSORTを用いた物体追跡

目次

環境

実際に動かしてみる

動作結果

プログラムの説明

さいごに