AppleがリリースしたDepthProをノートパソコンのローカル環境のWEBカメラで試してみた

Last updated at 2024-10-16Posted at 2024-10-09

Appleが公開したDepthProを使って、Webカメラでリアルタイム推論を試してみた

まず一言、[RealSense RealSense RealSense RealSense RealSense RealSense RealSense RealSense] が欲しい！！！！！
冗談はさておき、今回はAppleがリリースしたDepthProモデルを使用して、ノートパソコンのWebカメラを使ってリアルタイムで深度推定を行う方法について紹介します。

Appleが2024年10月4日に公開したDepthProは、単独のカメラで深度推定ができるモデルです。公開後すぐに試してみましたが、やはりWebカメラを使ってリアルタイムで結果を表示させたいと思いました。そこで、この数日間の余裕を使ってチャレンジし、見事に動かすことができました！

以下はその紹介とコード、そして動画です。

公式リンク：

一応公式のやり方を紹介

まずは、GitHubリポジトリからクローンします。

git clone https://github.com/apple/ml-depth-pro.git

次に、main.pyを作成します（公式のやり方ではありませんが、私はwebカメラを動くためにmain.py作っただけ）。

cd ml-depth-pro
touch main.py
python3 -m venv .venv 
source .venv/bin/activate

必要なライブラリをインストールします：

pip install huggingface-hub
huggingface-cli download --local-dir checkpoints apple/DepthPro
pip install -e .
pip install opencv-python

公式のサンプルコードは以下のように実行できます：

# 単一の画像で深度推定を実行
depth-pro-run -i ./data/example.jpg
# 詳細オプションは depth-pro-run -h で確認できます

Webカメラを使う場合、仮想環境に入った状態で以下のコマンドを実行します：

python3 main.py

Webカメラ対応のカスタムコード

僕は、apple/depth_pro/ml-depth-pro/src/depth_pro/cli/run.pyのコードを参考にしながら、Webカメラを利用したリアルタイムの深度推定コードを作成しました。ほとんどの部分はGPTが助けてくれました。やっぱりGPTは頼りになりますね！

環境

OS: Ubuntu 24.04
GPU: GeForce RTX 3080 laptop (8GB)
メモリ: 40GB

デバイス選択の説明

以下のget_torch_device()関数で、利用可能なデバイスを自動的に選択します。CUDA対応GPUがあればGPUが選ばれ、Apple Siliconなら「mps」、それ以外はCPUが選択されます。理論上、Macでも動作可能ですが、今回はMacでは試していません。

def get_torch_device() -> torch.device:
    """Get the Torch device."""
    if torch.cuda.is_available():
        device = torch.device("cuda:0")
        print("Using GPU (CUDA)")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("Using GPU (MPS)")
    else:
        device = torch.device("cpu")
        print("Using CPU")
    return device

動画です。チャンネル登録よろしくお願いします

www

Webカメラ用のコード（Full CODE）

以下が、Webカメラを使ったリアルタイムの深度推定を行うための完全なコードです：

#!/usr/bin/env python3
"""Modified script to run DepthPro with webcam input."""

import logging
import cv2
import torch
from matplotlib import pyplot as plt
from PIL import Image
from depth_pro import create_model_and_transforms
import numpy as np

LOGGER = logging.getLogger(__name__)

def get_torch_device() -> torch.device:
    """Get the Torch device."""
    if torch.cuda.is_available():
        device = torch.device("cuda:0")
        print("Using GPU (CUDA)")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("Using GPU (MPS)")
    else:
        device = torch.device("cpu")
        print("Using CPU")
    return device


def run_webcam():
    """Run Depth Pro on webcam input in real-time."""
    # Load model.
    model, transform = create_model_and_transforms(
        device=get_torch_device(),
        precision=torch.half,
    )
    model.eval()

    cap = cv2.VideoCapture(0)  # Open webcam, 0 is usually the default webcam ID

    if not cap.isOpened():
        LOGGER.error("Error: Could not open webcam.")
        return

    plt.ion()
    fig = plt.figure()
    ax_rgb = fig.add_subplot(121)
    ax_disp = fig.add_subplot(122)

    while True:
        ret, frame = cap.read()
        if not ret:
            LOGGER.error("Error: Could not read frame from webcam.")
            break

        # Convert OpenCV image (BGR) to PIL Image (RGB)
        image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

        # Run prediction
        prediction = model.infer(transform(image), f_px=None)

        # Extract the depth map
        depth = prediction["depth"].detach().cpu().numpy().squeeze()

        # Display the original frame and depth map
        ax_rgb.imshow(image)
        ax_disp.imshow(depth, cmap="turbo_r")
        fig.canvas.draw()
        fig.canvas.flush_events()

        # Show the frame and depth map in OpenCV windows
        normalized_depth = (depth - depth.min()) / (depth.max() - depth.min())
        color_depth = (plt.get_cmap("turbo_r")(normalized_depth)[..., :3] * 255).astype(np.uint8)
        cv2.imshow("Webcam", frame)
        cv2.imshow("Depth Map", color_depth)

        # Exit when 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()


if __name__ == "__main__":
    run_webcam()

まとめ

[RealSense RealSense RealSense RealSense RealSense RealSense RealSense RealSense] が欲しい！！！
カメラを使うとFPSが低下するのは避けられませんが、カメラの画素設定や処理の最適化で改善できる可能性はあります。まだその部分は試していませんが、工夫次第で良くなるかもしれませんね。

最後まで読んでいただき、ありがとうございました。拡散よろしくお願いします！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up