More than 1 year has passed since last update.

Pythonista3 と機械学習（Core ML）のVision Framework で、手を追っかけてもらう

Last updated at 2022-12-22Posted at 2022-12-22

この記事は、Pythonista3 Advent Calendar 2022 の23日目の記事です。

一方的な偏った目線で、Pythonista3 を紹介していきます。

ほぼ毎日iPhone（Pythonista3）で、コーディングをしている者です。よろしくお願いします。

以下、私の2022年12月時点の環境です。

sysInfo.log

--- SYSTEM INFORMATION ---
* Pythonista 3.3 (330025), Default interpreter 3.6.1
* iOS 16.1.1, model iPhone12,1, resolution (portrait) 828.0 x 1792.0 @ 2.0

他の環境(iPad や端末の種類、iOS のバージョン違い)では、意図としない挙動(エラーになる)なる場合もあります。ご了承ください。

ちなみに、model iPhone12,1 は、iPhone11 です。

この記事でわかること

カメラのリアルタイムな情報をPythonista3 のView に表示させる
Vision Framework を使い、手の情報を検出
手のパーツをTracking する

機械学習もすごいし、iPhone もすごい

前回は、静止画を機械学習で顔の検出をしました。

事前に用意されている、Core ML モデル。比較的容易に実装できるVision Framework。そして一瞬で処理してくれるiPhone ちゃん。

どれもすごい。

かがくのちからってすげー！

機械学習とか言う前に、View にカメラ情報を出す

今回の中で、一番ここが面倒かもしれません。View の内でカメラから得た情報をリアルタイムに出します。

ARKit では、SceneKit のSCNView のサブクラスとしてのARSCNView がいい感じにやってくれました。

ARSCNView | Apple Developer Documentation

Vision には用意がないので自分で組むことになります。

最低限にただカメラの情報を垂れ流すだけ

from objc_util import ObjCClass
import ui

import pdbg

AVCaptureVideoPreviewLayer = ObjCClass('AVCaptureVideoPreviewLayer')
AVCaptureSession = ObjCClass('AVCaptureSession')
AVCaptureDevice = ObjCClass('AVCaptureDevice')
AVCaptureDeviceInput = ObjCClass('AVCaptureDeviceInput')
AVCaptureVideoDataOutput = ObjCClass('AVCaptureVideoDataOutput')


class CameraView(ui.View):
 def __init__(self, *args, **kwargs):
   ui.View.__init__(self, *args, **kwargs)
   self.bg_color = 'green'
   self.flex = 'WH'
   self.layer = self.objc_instance.layer()

   self.previewLayer: AVCaptureVideoPreviewLayer
   self.init()

 def layout(self):
   self.previewLayer.frame = self.objc_instance.bounds()

 def init(self):
   previewLayer = AVCaptureVideoPreviewLayer.new()
   self.layer.addSublayer_(previewLayer)
   self.previewLayer = previewLayer


class CameraViewController:
 def __init__(self):
   self.cameraView = CameraView()

   self.cameraFeedSession: AVCaptureSession
   self.viewDidLoad()
   self.viewDidAppear()

 def viewDidLoad(self):
   pass

 def viewDidAppear(self):
   _resizeAspectFill = 'AVLayerVideoGravityResizeAspectFill'

   self.cameraView.previewLayer.videoGravity = _resizeAspectFill
   self.setupAVSession()
   self.cameraView.previewLayer.session = self.cameraFeedSession

   self.cameraFeedSession.startRunning()

 def viewWillDisappear(self):
   self.cameraFeedSession.stopRunning()

 def setupAVSession(self):
   _builtInWideAngleCamera = 'AVCaptureDeviceTypeBuiltInWideAngleCamera'
   _video = 'vide'
   _front = 2
   _back = 1

   videoDevice = AVCaptureDevice.defaultDeviceWithDeviceType_mediaType_position_(
     _builtInWideAngleCamera, _video, _back)

   deviceInput = AVCaptureDeviceInput.deviceInputWithDevice_error_(
     videoDevice, None)

   session = AVCaptureSession.new()
   session.beginConfiguration()
   _Preset_high = 'AVCaptureSessionPresetHigh'
   session.setSessionPreset_(_Preset_high)

   if session.canAddInput_(deviceInput):
     session.addInput_(deviceInput)
   else:
     raise

   dataOutput = AVCaptureVideoDataOutput.new()
   if session.canAddOutput_(dataOutput):
     session.addOutput_(dataOutput)
     dataOutput.alwaysDiscardsLateVideoFrames = True
   else:
     raise
   session.commitConfiguration()
   self.cameraFeedSession = session


class View(ui.View):
 def __init__(self, *args, **kwargs):
   ui.View.__init__(self, *args, **kwargs)
   self.bg_color = 'maroon'
   self.cvc = CameraViewController()
   self.add_subview(self.cvc.cameraView)

 def will_close(self):
   self.cvc.viewWillDisappear()


if __name__ == '__main__':
 view = View()
 view.present(style='fullscreen', orientations=['portrait'])

録画のできない、ただただビデオプレビューな状態です。

AVCapture 〜 のクラスにより

どのカメラを取得するか
- AVCaptureDevice | Apple Developer Documentation
カメラを起動させる
- AVCaptureSession | Apple Developer Documentation
カメラ情報の入出力
- AVCaptureDeviceInput | Apple Developer Documentation
- AVCaptureVideoDataOutput | Apple Developer Documentation
View のLayler に描画
- AVCaptureVideoPreviewLayer | Apple Developer Documentation

この設定をすることで、やっとカメラからの情報をView に出すことができます。

Delegate を設定し、カメラの情報を取得操作できるようにする

def create_sampleBufferDelegate(self):
  # --- /delegate
  def captureOutput_didOutputSampleBuffer_fromConnection_(
      _self, _cmd, _output, _sampleBuffer, _connection):

    sampleBuffer = ObjCInstance(_sampleBuffer)
    print('o')

  def captureOutput_didDropSampleBuffer_fromConnection_(
      _felf, _cmd, _output, _sampleBuffer, _connection):
    ObjCInstance(_sampleBuffer)  # todo: 呼ぶだけ
    print('d')

    # --- delegate/

  _methods = [
    captureOutput_didOutputSampleBuffer_fromConnection_,
    captureOutput_didDropSampleBuffer_fromConnection_,
  ]

  _protocols = ['AVCaptureVideoDataOutputSampleBufferDelegate']

  sampleBufferDelegate = create_objc_class(
    'sampleBufferDelegate', methods=_methods, protocols=_protocols)
  return sampleBufferDelegate.new()

CameraViewController class 内にcreate_objc_class よりdelegate を実装していきます。

（雰囲気理解ですが）dispatch queue がDelegate 宣言時に必要みたいなので、関数として準備しておきます。

Dispatch Queue | Apple Developer Documentation

dispatch_queue_create がFunction なので、objc_util.c で、呼び出してから関数宣言をしています。

from objc_util import c

def dispatch_queue_create(_name, parent):
  _func = c.dispatch_queue_create
  _func.argtypes = [ctypes.c_char_p, ctypes.c_void_p]
  _func.restype = ctypes.c_void_p
  name = _name.encode('ascii')
  return ObjCInstance(_func(name, parent))

dispatch_queue_create | Apple Developer Documentation

delegate とDispatch Queue を設定し、実装反映させたコードが以下になります:

import ctypes

from objc_util import c, ObjCClass, ObjCInstance, create_objc_class
from objc_util import UIBezierPath
import ui

import pdbg

AVCaptureVideoPreviewLayer = ObjCClass('AVCaptureVideoPreviewLayer')
AVCaptureSession = ObjCClass('AVCaptureSession')
AVCaptureDevice = ObjCClass('AVCaptureDevice')
AVCaptureDeviceInput = ObjCClass('AVCaptureDeviceInput')
AVCaptureVideoDataOutput = ObjCClass('AVCaptureVideoDataOutput')

CAShapeLayer = ObjCClass('CAShapeLayer')


def dispatch_queue_create(_name, parent):
  _func = c.dispatch_queue_create
  _func.argtypes = [ctypes.c_char_p, ctypes.c_void_p]
  _func.restype = ctypes.c_void_p
  name = _name.encode('ascii')
  return ObjCInstance(_func(name, parent))


class CameraView(ui.View):
  def __init__(self, *args, **kwargs):
    ui.View.__init__(self, *args, **kwargs)
    self.bg_color = 'green'
    self.flex = 'WH'
    self.layer = self.objc_instance.layer()

    self.previewLayer: AVCaptureVideoPreviewLayer
    self.overlayLayer: CAShapeLayer
    self.init()

  def layout(self):
    self.previewLayer.frame = self.objc_instance.bounds()
    self.overlayLayer.frame = self.objc_instance.bounds()

  def init(self):
    previewLayer = AVCaptureVideoPreviewLayer.new()
    overlayLayer = CAShapeLayer.new()

    self.previewLayer = previewLayer
    self.overlayLayer = overlayLayer
    self.layer.addSublayer_(self.previewLayer)
    self.setupOverlay()

  def setupOverlay(self):
    self.previewLayer.addSublayer_(self.overlayLayer)


class CameraViewController:
  def __init__(self):
    self.cameraView = CameraView()
    self.videoDataOutputQueue = dispatch_queue_create('imageDispatch', None)
    self.delegate = self.create_sampleBufferDelegate()

    self.cameraFeedSession: AVCaptureSession
    self.viewDidLoad()
    self.viewDidAppear()

  def viewDidLoad(self):
    pass

  def viewDidAppear(self):
    _resizeAspectFill = 'AVLayerVideoGravityResizeAspectFill'

    self.cameraView.previewLayer.videoGravity = _resizeAspectFill
    self.setupAVSession()
    self.cameraView.previewLayer.session = self.cameraFeedSession

    self.cameraFeedSession.startRunning()

  def viewWillDisappear(self):
    self.cameraFeedSession.stopRunning()

  def setupAVSession(self):
    _builtInWideAngleCamera = 'AVCaptureDeviceTypeBuiltInWideAngleCamera'
    _video = 'vide'
    _front = 2
    _back = 1

    videoDevice = AVCaptureDevice.defaultDeviceWithDeviceType_mediaType_position_(
      _builtInWideAngleCamera, _video, _back)

    deviceInput = AVCaptureDeviceInput.deviceInputWithDevice_error_(
      videoDevice, None)

    session = AVCaptureSession.new()
    session.beginConfiguration()
    _Preset_high = 'AVCaptureSessionPresetHigh'
    session.setSessionPreset_(_Preset_high)

    if session.canAddInput_(deviceInput):
      session.addInput_(deviceInput)
    else:
      raise

    dataOutput = AVCaptureVideoDataOutput.new()
    if session.canAddOutput_(dataOutput):
      session.addOutput_(dataOutput)
      dataOutput.alwaysDiscardsLateVideoFrames = True
      dataOutput.setSampleBufferDelegate_queue_(self.delegate,
                                                self.videoDataOutputQueue)
    else:
      raise
    session.commitConfiguration()
    self.cameraFeedSession = session

  def create_sampleBufferDelegate(self):
    # --- /delegate
    def captureOutput_didOutputSampleBuffer_fromConnection_(
        _self, _cmd, _output, _sampleBuffer, _connection):

      sampleBuffer = ObjCInstance(_sampleBuffer)
      print('didOutputSampleBuffer')

    def captureOutput_didDropSampleBuffer_fromConnection_(
        _felf, _cmd, _output, _sampleBuffer, _connection):
      ObjCInstance(_sampleBuffer)  # todo: 呼ぶだけ
      print('didDropSampleBuffer')

      # --- delegate/

    _methods = [
      captureOutput_didOutputSampleBuffer_fromConnection_,
      captureOutput_didDropSampleBuffer_fromConnection_,
    ]

    _protocols = ['AVCaptureVideoDataOutputSampleBufferDelegate']

    sampleBufferDelegate = create_objc_class(
      'sampleBufferDelegate', methods=_methods, protocols=_protocols)
    return sampleBufferDelegate.new()


class View(ui.View):
  def __init__(self, *args, **kwargs):
    ui.View.__init__(self, *args, **kwargs)
    self.bg_color = 'maroon'
    self.cvc = CameraViewController()
    self.add_subview(self.cvc.cameraView)

  def will_close(self):
    self.cvc.viewWillDisappear()


if __name__ == '__main__':
  view = View()
  view.present(style='fullscreen', orientations=['portrait'])

永続的に`delegate` を呼び出す

captureOutput_didOutputSampleBuffer_fromConnection_ と、captureOutput_didDropSampleBuffer_fromConnection_ の2つをメソッドとして、指定しています。

Output の呼び出しのみで問題なさそうですが、Output のみですと、最初のフレーム（？）の5,60 くらいしかコールしてくれません。

Drop もコール（ほぼ空打ち）させることで、Output -> Drop -> Output -> Drop -> ... と、順番に永続してコールしてくれるようになります。

ここらへんの、理由はよくわかっていません😇

カメラ側の準備が整う

CameraView class にてui.View を継承さることで

objc_util（ui.View.objc_instance）側
- カメラを描画させるLayler 等の操作
Python（Pythonista3）側
- .flex で全画面

と、柔軟にやりとりができるようにしています。

今後検知した情報を可視化するためのLayler も持たせます。

CameraViewController class では、delegate によりカメラからの（毎回のカメラ上のキャプチャバッファー）情報を得ることができました。

これでやっと、Vision Framework へ情報を投げつけて、機械学習結果を返してもらう流れができました。

`VNDetectHumanHandPoseRequest` で手のTracking ✋

検知した人差し指の先をTracking しつつ、その他の手の情報をView 上にテキストとして出力しています。

from math import pi
import ctypes

from objc_util import c, ObjCClass, ObjCInstance, create_objc_class, on_main_thread
from objc_util import UIBezierPath, UIColor, CGRect
import ui

import pdbg

VNDetectHumanHandPoseRequest = ObjCClass('VNDetectHumanHandPoseRequest')
VNSequenceRequestHandler = ObjCClass('VNSequenceRequestHandler')

AVCaptureVideoPreviewLayer = ObjCClass('AVCaptureVideoPreviewLayer')
AVCaptureSession = ObjCClass('AVCaptureSession')
AVCaptureDevice = ObjCClass('AVCaptureDevice')
AVCaptureDeviceInput = ObjCClass('AVCaptureDeviceInput')
AVCaptureVideoDataOutput = ObjCClass('AVCaptureVideoDataOutput')

CAShapeLayer = ObjCClass('CAShapeLayer')


def dispatch_queue_create(_name, parent):
  _func = c.dispatch_queue_create
  _func.argtypes = [ctypes.c_char_p, ctypes.c_void_p]
  _func.restype = ctypes.c_void_p
  name = _name.encode('ascii')
  return ObjCInstance(_func(name, parent))


def parseCGRect(cg_rect: CGRect) -> tuple:
  origin, size = [cg_rect.origin, cg_rect.size]
  return (origin.x, origin.y, size.width, size.height)


class CameraView(ui.View):
  def __init__(self, *args, **kwargs):
    ui.View.__init__(self, *args, **kwargs)
    self.bg_color = 'green'
    self.flex = 'WH'
    self.log_area = ui.TextView()
    self.log_area.editable = False
    self.log_area.flex = 'WH'
    self.log_area.font = ('Inconsolata', 10)
    self.log_area.bg_color = (0.0, 0.0, 0.0, 0.0)
    self.layer = self.objc_instance.layer()

    self.previewLayer: AVCaptureVideoPreviewLayer
    self.overlayLayer: CAShapeLayer
    self.init()

    self.log_area.text = ''
    # layer を重ねた後でないと、隠れてしまう
    self.add_subview(self.log_area)

  def layout(self):
    self.previewLayer.frame = self.objc_instance.bounds()
    self.overlayLayer.frame = self.objc_instance.bounds()

  def update_log_area(self, text):
    self.log_area.text = f'{text}'

  def init(self):
    previewLayer = AVCaptureVideoPreviewLayer.new()
    overlayLayer = CAShapeLayer.new()

    self.layer.addSublayer_(previewLayer)
    self.previewLayer = previewLayer
    self.overlayLayer = overlayLayer
    self.setupOverlay()

  def setupOverlay(self):
    self.previewLayer.addSublayer_(self.overlayLayer)
    self.setCAShapeLayer()

  def setCAShapeLayer(self):
    _blueColor = UIColor.blueColor().cgColor()
    _cyanColor = UIColor.cyanColor().cgColor()

    self.overlayLayer.setLineWidth_(2.0)
    self.overlayLayer.setStrokeColor_(_blueColor)
    self.overlayLayer.setFillColor_(_cyanColor)
    self.previewLayer.addSublayer_(self.overlayLayer)

  @on_main_thread
  def showPoints(self, _x, _y):
    _, _, _width, _height = parseCGRect(self.overlayLayer.frame())
    x = _width - (_width * (1 - _x))
    y = _height - (_height * _y)

    radius = 8.0
    startAngle = 0.0
    endAngle = pi * 2.0

    arc = UIBezierPath.new()
    arc.addArcWithCenter_radius_startAngle_endAngle_clockwise_(
      (x, y), radius, startAngle, endAngle, True)

    self.overlayLayer.setPath_(arc.CGPath())


class CameraViewController:
  def __init__(self):
    self.cameraView = CameraView()
    _name = 'CameraFeedDataOutput'
    self.videoDataOutputQueue = dispatch_queue_create(_name, None)
    self.delegate = self.create_sampleBufferDelegate()

    self.cameraFeedSession: AVCaptureSession
    self.handPoseRequest: VNDetectHumanHandPoseRequest
    self.viewDidLoad()
    self.viewDidAppear()

  def viewDidLoad(self):
    handPoseRequest = VNDetectHumanHandPoseRequest.new()
    handPoseRequest.maximumHandCount = 1

    self.handPoseRequest = handPoseRequest

  def viewDidAppear(self):
    _resizeAspectFill = 'AVLayerVideoGravityResizeAspectFill'

    self.cameraView.previewLayer.videoGravity = _resizeAspectFill
    self.setupAVSession()
    self.cameraView.previewLayer.session = self.cameraFeedSession

    self.cameraFeedSession.startRunning()

  def viewWillDisappear(self):
    self.cameraFeedSession.stopRunning()

  def setupAVSession(self):
    _builtInWideAngleCamera = 'AVCaptureDeviceTypeBuiltInWideAngleCamera'
    _video = 'vide'
    _front = 2
    _back = 1

    videoDevice = AVCaptureDevice.defaultDeviceWithDeviceType_mediaType_position_(
      _builtInWideAngleCamera, _video, _back)

    deviceInput = AVCaptureDeviceInput.deviceInputWithDevice_error_(
      videoDevice, None)

    session = AVCaptureSession.new()
    session.beginConfiguration()
    _Preset_high = 'AVCaptureSessionPresetHigh'
    session.setSessionPreset_(_Preset_high)

    if session.canAddInput_(deviceInput):
      session.addInput_(deviceInput)
    else:
      raise

    dataOutput = AVCaptureVideoDataOutput.new()
    if session.canAddOutput_(dataOutput):
      session.addOutput_(dataOutput)
      dataOutput.alwaysDiscardsLateVideoFrames = True
      dataOutput.setSampleBufferDelegate_queue_(self.delegate,
                                                self.videoDataOutputQueue)
    else:
      raise
    session.commitConfiguration()
    self.cameraFeedSession = session

  def detectedHandPose_request(self, request_list):
    _all = 'VNIPOAll'  # VNHumanHandPoseObservationJointsGroupNameAll
    _point = 'VNHLKITIP'  # 人差し指先端
    for result in request_list:
      handParts = result.recognizedPointsForJointsGroupName_error_(_all, None)

      self.cameraView.update_log_area(f'{handParts}')

      recognizedPoint = handParts[_point]
      x_point = recognizedPoint.x()
      y_point = recognizedPoint.y()
      self.cameraView.showPoints(x_point, y_point)

  def create_sampleBufferDelegate(self):
    sequenceHandler = VNSequenceRequestHandler.new()
    _right = 6  # kCGImagePropertyOrientationRight

    # --- /delegate
    def captureOutput_didOutputSampleBuffer_fromConnection_(
        _self, _cmd, _output, _sampleBuffer, _connection):
      sampleBuffer = ObjCInstance(_sampleBuffer)
      sequenceHandler.performRequests_onCMSampleBuffer_orientation_error_(
        [self.handPoseRequest], sampleBuffer, _right, None)

      observation_array = self.handPoseRequest.results()
      if observation_array:
        self.detectedHandPose_request(observation_array)

    def captureOutput_didDropSampleBuffer_fromConnection_(
        _felf, _cmd, _output, _sampleBuffer, _connection):
      ObjCInstance(_sampleBuffer)  # todo: 呼ぶだけ

    # --- delegate/

    _methods = [
      captureOutput_didOutputSampleBuffer_fromConnection_,
      captureOutput_didDropSampleBuffer_fromConnection_,
    ]

    _protocols = ['AVCaptureVideoDataOutputSampleBufferDelegate']

    sampleBufferDelegate = create_objc_class(
      'sampleBufferDelegate', methods=_methods, protocols=_protocols)
    return sampleBufferDelegate.new()


class View(ui.View):
  def __init__(self, *args, **kwargs):
    ui.View.__init__(self, *args, **kwargs)
    self.bg_color = 'maroon'
    self.cvc = CameraViewController()
    self.add_subview(self.cvc.cameraView)

  def will_close(self):
    self.cvc.viewWillDisappear()


if __name__ == '__main__':
  view = View()
  view.present(style='fullscreen', orientations=['portrait'])

前回実装した、静止画検出の応用となります。

カメラをView に出すよりも、コード量少なく（比較的楽に）実装できます。

動画でのRequest

VNDetectHumanHandPoseRequest と、VNSequenceRequestHandler を使います。

前回の「顔検出」に続き、「手検出」も事前にclass として用意されているのはいいですね。

VNDetectHumanHandPoseRequest | Apple Developer Documentation

VNSequenceRequestHandler | Apple Developer Documentation

VNImageRequestHandler ではうまくいかず、VNSequenceRequestHandler をHandler として使っています。

（dispatch_queue 等のthread 処理がうまくいっていないのかもしれません）

VNImageRequestHandler | Apple Developer Documentation

検出できる手は一つで問題ないので、最大数を1 に:

handPoseRequest = VNDetectHumanHandPoseRequest.new()
handPoseRequest.maximumHandCount = 1

delegate を生成するメソッド内で、VNSequenceRequestHandler を呼び出します。

実際にdelegate として走るメソッド内でRequest 処理を毎回させます:

def create_sampleBufferDelegate(self):
  sequenceHandler = VNSequenceRequestHandler.new()
  _right = 6  # kCGImagePropertyOrientationRight

  # --- /delegate
  def captureOutput_didOutputSampleBuffer_fromConnection_(
      _self, _cmd, _output, _sampleBuffer, _connection):
    sampleBuffer = ObjCInstance(_sampleBuffer)
    sequenceHandler.performRequests_onCMSampleBuffer_orientation_error_(
      [self.handPoseRequest], sampleBuffer, _right, None)

    observation_array = self.handPoseRequest.results()
    if observation_array:
      self.detectedHandPose_request(observation_array)

手が検出されると、observation_array へ情報が格納されます。

取得できたら、View 上でよきように処理をしてもらうようにdetectedHandPose_request メソッドへ投げます。

Vision Framework の検出処理自体は、ホント静止画の時と変わらないですね（ただし、高速で処理をしてくれている）。

`VNHumanHandPoseObservation` と、`VNRecognizedPoint`

検出した手の情報の配列（今回maximumHandCount = 1 なので1つ）の中に、手の各パーツでの情報が入っています:

observation_array = self.handPoseRequest.results()

手の情報は、VNHumanHandPoseObservation として返されます。

VNHumanHandPoseObservation | Apple Developer Documentation

全部呼び出す

VNHumanHandPoseObservation から、個々の情報が欲しいので:

_all = 'VNIPOAll'  # VNHumanHandPoseObservationJointsGroupNameAll

handParts = result.recognizedPointsForJointsGroupName_error_(_all, None)

VNHumanHandPoseObservationJointsGroupNameAll で、パーツ全部を呼び出しています。

VNHumanHandPoseObservationJointsGroupNameAll | Apple Developer Documentation

`VNHumanHandPoseObservationJointsGroupNameAll` が`VNIPOAll` と判明するまで

Swift やObjective-C ですと、.all やVNHumanHandPoseObservationJointsGroupNameAll で呼び出すことになりますが、時々objc_util では呼び出せない場面もあります。

特にGlobal Variable のあらかじめ定義されている変数名ですね。

今回VNIPOAll を見つけ出した方法は:

for result in request_list:
  print(result.availableJointsGroupNames())

と、先にグループ名の情報を確認しました:

(
    VNHLRKT,    <- 親指？ Thumb
    VNHLRKM,    <- 中指？ Middle
    VNHLRKI,    <- 人差し指？ Index
    VNHLRKR,    <- 薬指？ Ring
    VNHLRKP,    <- 小指？ Little ？
    VNIPOAll    <- これっぽい！
)

VNHumanHandPoseObservationJointsGroupName | Apple Developer Documentation

VNHumanHandPoseObservationJointName | Apple Developer Documentation

.all だと思われる'VNIPOAll' を入れてみると:

_all = 'VNIPOAll'  # VNHumanHandPoseObservationJointsGroupNameAll

for result in request_list:
  handParts = result.recognizedPointsForJointsGroupName_error_(_all, None)
  print(handParts)

出力結果は、21ヶの名前と数値でした:

{
    VNHLKIDIP = "[0.485334; 0.583736]";
    VNHLKIMCP = "[0.166403; 0.488322]";
    VNHLKIPIP = "[0.342590; 0.580520]";
    VNHLKITIP = "[0.569792; 0.570696]";
    VNHLKMDIP = "[0.403824; 0.585380]";
    VNHLKMMCP = "[0.139485; 0.497076]";
    VNHLKMPIP = "[0.296893; 0.583691]";
    VNHLKMTIP = "[0.477001; 0.572641]";
    VNHLKPDIP = "[0.377817; 0.574937]";
    VNHLKPMCP = "[0.244330; 0.498787]";
    VNHLKPPIP = "[0.299134; 0.569371]";
    VNHLKPTIP = "[0.428175; 0.559813]";
    VNHLKRDIP = "[0.373153; 0.578908]";
    VNHLKRMCP = "[0.167453; 0.499612]";
    VNHLKRPIP = "[0.283478; 0.573761]";
    VNHLKRTIP = "[0.436168; 0.565271]";
    VNHLKTCMC = "[0.512299; 0.336474]";
    VNHLKTIP = "[0.606151; 0.494821]";
    VNHLKTMP = "[0.598214; 0.402705]";
    VNHLKTTIP = "[0.589989; 0.569710]";
    VNHLKWRI = "[0.241853; 0.266611]";
}

VNRecognizedPoint | Apple Developer Documentation

Documentation をみても、5本の指にそれぞれ4つのポイントと、手首で計21 あるので、無事に取れてそうです。

アルファベットの略字が過ぎるこで、Documentation やこのようなものと見比べながら見当をつけていきます。

Body Anatomy: Upper Extremity Joints | The Hand Society

規則性として、VNHLK までは同様です。いろいろと情報をかき集めた上での勘ですが:

VN
- Vision
H
- Hand
L
- Landmark
K
- Key

それぞれの指を一文字で表し:

I: index 人差し指
- DIP
  - 第一関節
- MCP
  - 付け根
- PIP
  - 第二関節
- TIP
  - 先
WRI
- 手首

人差し指の、指先を指定してみます:

recognizedPoint = handParts[_point]
      
x_point = recognizedPoint.x()
y_point = recognizedPoint.y()
self.cameraView.showPoints(x_point, y_point)

情報の可視化

たくさん取れそうなのでui.TextView にて、'VNIPOAll' の情報を一括で流し込み、テキスト表示に。

人差し指の指先を、Tracking としてみましょう。

`ui.TextView` の処理

文字列を投げ込めばいいので、update_log_area メソッドを間口にして、delegate で処理されたら勝手に更新されます。

class CameraView(ui.View):
  def __init__(self, *args, **kwargs):
    self.log_area = ui.TextView()
    self.log_area.editable = False
    self.log_area.flex = 'WH'
    self.log_area.font = ('Inconsolata', 10)
    self.log_area.bg_color = (0.0, 0.0, 0.0, 0.0)
    
    # layer を重ねた後でないと、隠れてしまう
    self.add_subview(self.log_area)
    
  def update_log_area(self, text):
    self.log_area.text = f'{text}'

文字列量の縦幅サイズ調整が面倒だったので、全画面前提で設定をしています。背景色を透過させてカメラのView 情報をまれるようにしています。

log の方の確認が優先であれば、背景透過度合いを上げていけばテキストが見やすくなると思います。

また、self.add_subview(self.log_area) の呼ぶ位置も考えておかないと、カメラのView Layer の方が上に被さる状態になっていまうので、順番を気をつけます。

`overlayLayer` での描画

描画としてフィードバックの確認ができればOK の思想で、雑に設定してしてしまっています。

まだCAShapeLayer, UIBezierPath が使い慣れていない感があります。。。

数値投げたら、その位置に描画されるようにしています。

@on_main_thread
def showPoints(self, _x, _y):
  _, _, _width, _height = parseCGRect(self.overlayLayer.frame())
  x = _width - (_width * (1 - _x))
  y = _height - (_height * _y)

  radius = 8.0
  startAngle = 0.0
  endAngle = pi * 2.0

  arc = UIBezierPath.new()
  arc.addArcWithCenter_radius_startAngle_endAngle_clockwise_(
    (x, y), radius, startAngle, endAngle, True)

  self.overlayLayer.setPath_(arc.CGPath())

キモは、デコレータ@on_main_thread ですね。

ui.View の処理とは別の処理となるので、デコレータをつけないと描画されません。

また、self.log_area であるui.TextView は、ui.View thread で走っているためか、デコレータをつけずとも数値更新がされています。

次回は

こんなにもスルスルと、指のTracking をしてくれるなんて驚きですね。

UIBezierPath をうまく使いつつすると、面白い表現ができそうです。

リアルタイムの検出ができたので、動画ファイルでの検出もさほど難しくはないでしょう。

私がVision Framework を使い始めてから日が浅く、内容としてまとまりきれていない部分も多くありました。

が、各指のポイント名を調査し判明したとき

「Vision Framework の調べ物でこんな不毛なことある？」

と思ってしまいシェアせざるを得ませんでした。

Pythonista3 のみで生きる無駄知識。。。

今回でVision Framework は終了し、次回よりPythonista3 Advent Calendar 2022 の最終章へ向かいたいと思います。

WebView をやります。よろしくお願いします。

ここまで、読んでいただきありがとうございました。

せんでん

Discord

Pythonista3 の日本語コミュニティーがあります。みなさん優しくて、わからないところも親身に教えてくれるのでこの機会に覗いてみてください。

書籍

iPhone/iPad でプログラミングする最強の本。

Python や Jupyter で iPhone/iPad 先端機能を簡単･自由にプログラミング！「活用篇」：hirax

Python や Jupyter で iPhone/iPad 先端機能を簡単･自由にプログラミング！「土台篇」：hirax

その他

サンプルコード

Pythonista3 Advent Calendar 2022 でのコードをまとめているリポジトリがあります。

コードのエラーや変なところや改善点など。ご指摘やPR お待ちしておりますー

Twitter

なんしかガチャガチャしていますが、お気兼ねなくお声がけくださいませー

やれるか、やれないか。ではなく、やるんだけども、紹介説明することは尽きないと思うけど、締め切り守れるか？って話よ！（クズ）

Pythonista3 Advent Calendar 2022 https://t.co/JKUxA525Pt #Qiita
— pome-ta (@pome_ta93) November 4, 2022

GitHub

基本的にGitHub にコードをあげているので、何にハマって何を実装しているのか観測できると思います。

Pythonista3Advent Calendar 2022

Day 23

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Pythonista3 と機械学習（Core ML） のVision Framework で、手を追っかけてもらう