カメラの映像をHuman Body Pose APIに食わせて身長を算出する

Posted at 2025-03-21

AppleのHumanBodyPose APIを使って遊んでたら身長算出の箇所でどんづまったので備忘録として

今回の起こった問題

カメラを使用して映像を取得
HumanBodyPose APIをコールしリクエストを受け取る

受け取った値を確認すると180cmで固定
(実際にカメラで捉えた人物は160cmちょいの人物でした)

※ vnhumanbodypose3dobservation.bodyHeightの値を使用

端末

iPhone 16 Pro Max
iOS:18.3.1

調査過程で分かったこと

VNHumanBodyPose3DObservation.HeightEstimationがreferenceを返却している
※ 基準値からの推定であり、Lidarの性能を活かしきれていませんでした

case measured
A technique that uses LiDAR depth data to measure body height, in meters.
case reference
A technique that uses a reference height.

調査過程

わからんことはとりあえずドキュメントみたり、WWDCのビデオみて振り返る！

ちなみにHuman Body PoseについてはWWDC23で触れられています

ビデオ見ていると、おそらくこれだろうな〜ってのがありました。
※ 10:50ごろ

そこで自分が作成した処理にはHumanBodyPoseAPIをコールする際にDepthの情報を含めていないことが判明しました

ちなみに、自分が書いていたのはこんな感じの処理
ARKitから受け取った映像をCVPixelBufferに変換してリクエストをコールしていました

.swift

func processFrame(_ frame: ARFrame) {
        guard let pixelBuffer = frame.capturedImage as CVPixelBuffer? else {
            return
        }
        
        let request = VNDetectHumanBodyPose3DRequest(completionHandler: bodyPoseHandler)
        ... etc
}

WWDC切り抜き
ファイルにdepthの情報が含まれていたらそのままコールしてもDepth使うよ〜って言っていますが

今回は画像ではなく、映像から取得していたのでDepthの情報が含まれていなかったので？と

既存の実装ではAR Kitから映像データを受け取っていたので
AVDepthDataってどこから取得すればええねん...??ってなりました

ARKitにもDepthのデータ取得する方法があります。

が、型が違うんですよね

ARだとARDepthData

AVFoundationはAVDepthData

よってカメラ映像取得処理の大幅書き換えが必要になりました...()

ARKitによる実装からAVFaundationを使ったカメラ映像取得処理に切り替える

実装自体は~~めんどくさいので~~似たようなサンプルコードを引っ張ってきましょう

AppleからAnimal Body Poseの実装に関するサンプルが出ていますので、それを引っ張ってきて

必要な箇所を書き換える

1.AnimalPoseDetectorを参考にして、HumanBodyPose3DDetectorを作成する

.swift

import AVFoundation
import Vision

// The Vision part.
class HumanBodyPose3DDetector: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate, ObservableObject {
    // Get the animal body joints using the VNRecognizedPoint.
    @Published var humanBodyParts = [VNHumanBodyPose3DObservation.JointName : VNHumanBodyRecognizedPoint3D]()
    @Published var humanBodyPoseResult: VNHumanBodyPose3DObservation?
    var lastDepthData: AVDepthData?
    
    // Notify the delegate that a sample buffer was written.
    func captureOutput(
        _ output: AVCaptureOutput,
        didOutput sampleBuffer: CMSampleBuffer,
        from connection: AVCaptureConnection
    ) {
        // Create a new request to recognize an animal body pose.
        // let animalBodyPoseRequest = VNDetectAnimalBodyPoseRequest(completionHandler: detectedAnimalPose)
        let humanBodyPoseRequest = VNDetectHumanBodyPose3DRequest(completionHandler: detectedHumanBodyPose)
        guard let cvPixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            return
        }
        guard let depthData = self.lastDepthData else {
            print("lastDepthData is nil")
            return
        }
        // Create a new request handler.
        let imageRequestHandler = VNImageRequestHandler(
            cvPixelBuffer: cvPixelBuffer,
            depthData: depthData,
            orientation: .up,
            options: [:]
        )
        do {
            try imageRequestHandler.perform([humanBodyPoseRequest])
        } catch {
            print("Unable to perform the request: \(error).")
        }
    }

    func detectedHumanBodyPose(request: VNRequest, error: Error?) {
        // Get the results from VNAnimalBodyPoseObservations.
        guard let humanBodyPoseResults = request.results as? [VNHumanBodyPose3DObservation] else {
            return
        }
        guard let firstResult = humanBodyPoseResults.first else {
            return
        }
        // Get the animal body recognized points for the .all group.
        guard let humanBodyAllParts = try? humanBodyPoseResults.first?.recognizedPoints(.all) else {
            return
        }
        self.humanBodyPoseResult = firstResult
        self.humanBodyParts = humanBodyAllParts
    }
    
    // LiDARデプスデータ取得
    func depthDataOutput(
        _ output: AVCaptureDepthDataOutput,
        didOutput depthData: AVDepthData,
        timestamp: CMTime,
        connection: AVCaptureConnection
    ) {
        DispatchQueue.main.async {
            self.lastDepthData = depthData
        }
    }
}

2.CameraViewControllerでDepthの情報を受け取って、HumanBodyPose3DDetectorへ渡す

.swift


// LiDARのデプスデータ処理
extension CameraViewController: AVCaptureDepthDataOutputDelegate {
    func depthDataOutput(
        _ output: AVCaptureDepthDataOutput,
        didOutput depthData: AVDepthData,
        timestamp: CMTime,
        connection: AVCaptureConnection
    ) {
        (delegate as? HumanBodyPose3DDetector)?.depthDataOutput(output, didOutput: depthData, timestamp: timestamp, connection: connection)
    }
}

struct HumanDisplayView: UIViewControllerRepresentable {
    var humanJoint: HumanBodyPose3DDetector
    func makeUIViewController(context: Context) -> some UIViewController {
        let cameraViewController = CameraViewController()
        cameraViewController.delegate = humanJoint
        return cameraViewController
    }
    func updateUIViewController(_ uiViewController: UIViewControllerType, context: Context) {
    }
}

と、こんな感じで受け渡してあげれば
※ 細かな実装はAnimalBodyPoseのサンプルコード見ながら適宜調整してください

VNHumanBodyPose3DObservation.HeightEstimationの値がmeasuredになり、より正確な身長算出ができました〜

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up