More than 3 years have passed since last update.

RasPiで単眼深度推定

Last updated at 2022-01-14Posted at 2021-10-01

1.はじめに

これは神々の足跡をたどり、新たな神に出会った物語です。

始まりはこのツイートでした。

本当は横長な動画をインプットすると良いのでしょうけど、、、
まずはLite-HR-Depth👀
速度の割にエッジとか奥行きがしっかりしている🤔？ pic.twitter.com/2LgFv4B7gu
— 高橋かずひと@孫請級プログラマー🦔 (@KzhtTkhs) September 28, 2021

PINTO神様が変換したモデルを動かし、DEMO用のpythonコードをアップしてくださっています。まさに神。

というわけで、前回の記事で神々のお導きによりTensorflow LiteをインストールしたRaspberry Piに、このモデルとDEMOコードを入れて動かしてみます。

2.モデルをコピー

こちらのリポジトリに各種のモデルとインストール用のスクリプトを用意してくださっていますので実行しましょう。
今回は一番軽そうな192x640サイズのliteモデルを使ってみました。ダウンロード用のスクリプトはdownload_saved_model_lite_hr_depth_k_t_encoder_depth_192x640.shです。

私はあまりにもサル過ぎてスクリプトファイルのダウンロード方法がわからなかったので、vimでコピペして実行しました。早く霊長類に進化したい。wgetしたらファイルの中身がhtmlみたいになってるのはどうして？
なんならスクリプトの内容を一行ずつ実行しましょう。

コメント欄に神が降臨され神託をくださりました。次のコマンドで、ダウンロードして実行しましょう。

$ wget https://raw.githubusercontent.com/PINTO0309/PINTO_model_zoo/main/158_HR-Depth/download_saved_model_lite_hr_depth_k_t_encoder_depth_192x640.sh
$ chmod 755 download_saved_model_lite_hr_depth_k_t_encoder_depth_192x640.sh
$ ./download_saved_model_lite_hr_depth_k_t_encoder_depth_192x640.sh

一応、スクリプトの内容も残しておきます。

curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=1J4FANOXevJd-egbhGVeK9gzFQtqJamw4" > /dev/null
CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"
curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=1J4FANOXevJd-egbhGVeK9gzFQtqJamw4" -o resources.tar.gz
tar -zxvf resources.tar.gz
rm resources.tar.gz

すると、カレントディレクトリにsaved_model_lite_hr_depth_k_t_encoder_depth_192x640
というディレクトリが出来ているはずです。

これで学習済みモデルの導入ができたので、実行します。
今回はUSBカメラでキャプチャした画像を深度推定してみます。

実行用のプログラムは高橋神様のdemo用コードをちょびっとだけ変更して利用させていただきました。変更部分には日本語のコメントが入っています。

※初回アップ時のコードに全角スペースという地雷を仕込んでしまっていましたので修正しました。すみません。

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import copy
import time
import argparse

import cv2 as cv
import numpy as np
from tflite_runtime.interpreter import Interpreter

def run_inference(interpreter, input_size, image):

    # Pre process:Resize, BGR->RGB, Transpose, float32 cast
    input_image = cv.resize(image, dsize=(input_size[1], input_size[0]))
    input_image = cv.cvtColor(input_image, cv.COLOR_BGR2RGB)
    input_image = np.expand_dims(input_image, axis=0)
#    input_image = tf.cast(input_image, dtype=tf.float32) #tflite_runtimeにはcastメソッドが入ってないみたいなのでnumpyで変換
    input_image = np.array(input_image, dtype = "float32")
    input_image = input_image / 255.0
    # Inference
    input_details = interpreter.get_input_details()
    interpreter.set_tensor(input_details[0]['index'], input_image)
    interpreter.invoke()

    output_details = interpreter.get_output_details()
    depth_map = interpreter.get_tensor(output_details[3]['index'])

    # Post process
    depth_map = np.squeeze(depth_map)
    d_min = np.min(depth_map)
    d_max = np.max(depth_map)
    depth_map = (depth_map - d_min) / (d_max - d_min)
    depth_map = depth_map * 255.0
    depth_map = np.asarray(depth_map, dtype="uint8")

    depth_map = cv.resize(depth_map, (input_size[0], input_size[1])) #model_integer_quantを使う場合は必要。他のモデルなら不要。
#    depth_map = depth_map.reshape(input_size[0], input_size[1])  #この行は不要？

    return depth_map

def main():
    
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model",
        type=str,
        default='model_float16_quant.tflite',             #TFlite用のモデルは以下の5つ。使用するモデル以外はコメントアウト。
#        default='model_dynamic_range_quant.tflite',
#        default='model_float32.tflite',
#        default='model_integer_quant.tflite',
#        default='model_weight_quant.tflite',
    )
    parser.add_argument(
        "--input_size",
        type=str,
        default='192,640',
    )

    args = parser.parse_args()
    model_path = args.model
    input_size = args.input_size

    input_size = [int(i) for i in input_size.split(',')]

    # Initialize video capture
    cap = cv.VideoCapture(0)

    # Load model
    interpreter = Interpreter(model_path=model_path, num_threads=4)  #TFliteはPINTOさんのビルドしたモジュールを使用
    interpreter.allocate_tensors()
    start_time = time.time()     #FPS測定タイミング変更

    while True:
        #start_time = time.time()    #FPS測定タイミング変更のためコメントアウト
        # Capture read
        ret, frame = cap.read()
        frame = frame[144:336,0:640]    #撮影した画像をクロップして192x640に変更
        
        if not ret:
            break
        debug_image = copy.deepcopy(frame)

        # Inference execution
        depth_map = run_inference(
            interpreter,
            input_size,
            frame,
        )

        elapsed_time = time.time() - start_time
        start_time = time.time()     #FPS測定タイミング変更

        # Draw
        debug_image, depth_image = draw_debug(
            debug_image,
            elapsed_time,
            depth_map,
        )

        key = cv.waitKey(1)
        if key == 27:  # ESC
            break

        join_image = cv.vconcat([debug_image, depth_image])  #RGB画像とDepth画像を結合
        cv.imshow("HR-Depth Demo", join_image)
#        cv.imshow('HR-Depth RGB Demo', debug_image)       #結合画像を出力するのでコメントアウト
#        cv.imshow('HR-Depth Depth Demo', depth_image)     #結合画像を出力するのでコメントアウト
        cv.imwrite(str(time.time()) + ".jpg",join_image)   #結合画像を出力
    
    cap.release()
    cv.destroyAllWindows()


def draw_debug(image, elapsed_time, depth_map):
    image_width, image_height = image.shape[1], image.shape[0]
    debug_image = copy.deepcopy(image)

    # Apply ColorMap
    depth_image = cv.applyColorMap(depth_map, cv.COLORMAP_JET)
    depth_image = cv.resize(depth_image, dsize=(image_width, image_height))

    # Inference elapsed time
    cv.putText(debug_image,
             "Elapsed Time : " + '{:.1f}'.format(elapsed_time * 1000) + "ms" + " / " + '{:.2f}'.format(1 / elapsed_time) + "fps" ,
             (10, 30), cv.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 2,
             cv.LINE_AA)                        #FPS表示追加
    return debug_image, depth_image


if __name__ == '__main__':
    main()

でもって、得られた画像がこちら。

先ほどの画像は奥のピントが合ってないので、画像奥側へピントをずらしてみました。

すると奥側の分解がよくなったので、このモデルを使う際にはパンフォーカスの画像がよさそうです。

ちなみに動画ファイルで出力したかったけど、うまく出せませんでした。raspiのopencvでmp4はダメだけどaviならいけるって聞いた気がするんですけどね。

追記 10/1　速度アップ検証

PINTOさん、コメントありがとうございます。
早速他モデルとthread数増しの検証をしてみました。
試したのはthread数を1～4　×　モデルをmodel_float16_quant.tflite、model_flat32.tflite、model_dynamic_range_quant.tflite の3種類の組み合わせです。

model_integer_quantは次の様なエラーが出たので、落ち着いてゆっくり考えてみます。

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
DEMO.py:34: RuntimeWarning: invalid value encountered in true_divide
  depth_map = (depth_map - d_min) / (d_max - d_min)
Traceback (most recent call last):
  File "DEMO.py", line 134, in <module>
    main()
  File "DEMO.py", line 90, in main
    frame,
  File "DEMO.py", line 38, in run_inference
    depth_map = depth_map.reshape(input_size[0], input_size[1])
ValueError: cannot reshape array of size 7680 into shape (192,640)

==============
　※10/3追記
　　エラー内容を確認したところ、depth_mapが192x640ではなく、48x160で出力されていたことが原因でした。opencvのresizeコマンドで画像サイズを大きくするとエラーは消えましたが、デプス情報が得られていないようでした。
合わせて、model_weight_quantで確認したところ、4スレッドで1FPS、1スレッドで0.65FPS程度でした。　

各モデルを用いた時のスレッド数による処理速度の変化は次の通りでした。
※平均FPS値 (n=10)

num_thread	float16_quant	float32	dynamic_range_quant
1	0.78	0.75	0.59
2	1.27	1.27	0.93
3	1.45	1.46	0.94
4	1.65	1.64	1.00

モデルとしては、float16 = float32 > dynamic_range_quant
スレッド数の効果は float16とfloat32で、4 > 3 > 2 >> 1　
dynamic_range_quantで、4 ≒ 3 ≒ 2 > 1　という感じでした。

モデルの中身がわかってないので考察はできませんが、floatモデルでは4thread利用で速度が倍増しました。

3.まとめ

単眼深度推定には以前から興味があったのですが、どこから手を付ければ良いのかわからず困っていました。
しかし神様達のおかげで、よくわからないなりに、動かせてしまいました。
画像のピントをずらしたり照明を変えたり、画像の条件を変えてみると、何となく単眼深度推定の特徴がわかるような気もしますが、詳細な考察をできる知識はないので今回はここまでとします。

今回も、神様のおかげで達成感を積み上げられました。
というか、PINTO様のPINTO_model_zooってマジすごいですね。しばらくは面白そうなモデルを動かしてレポートする日が続きそうです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

RasPiで単眼深度推定

1.はじめに

2.モデルをコピー

追記 10/1 速度アップ検証

3.まとめ

追記 10/1　速度アップ検証