More than 3 years have passed since last update.

Telloから受信した単眼カメラ画像をリアルタイムに深度マップにして成功した件

Last updated at 2021-08-06Posted at 2021-08-06

今回、やったこと

TelloドローンをMacbookのキーボードで操作しながら、Tello内蔵の単眼カメラから送られくる画像ストリーミングデータを「深度マップ」に変えて、Telloの動作から遅延なしにリアルタイム表示することに成功しました。

なお、「深度マップ」とは、目の前の物体や背景の遠近距離をヒートマップ形式で表現した画像です。今回は、__MonoDepth2モデル__のPyTorch実装コードを利用しました。

ホバリング中のTelloから受信した、ほぼ同時刻の画像

滞空ホバリング中（"i"キーで離陸 => "r"キーで上昇後、ホバリング）のTelloから受信した、ほぼ同時刻の画像です。

**（深度マップ画像 by MonoDepth2 ）**

（色反転 cv2.bitwise_notメソッドをかけた画像）

（エッジ処理cv2.Cannyメソッドをかけた画像）

過去の記事との関係

過去に書いた２本の記事で成功したことを、つなぎ合わせてみました。

実行中にTerminalに標準出力されるメッセージ

Terminal

electron@diynoMacBook-Pro DJITelloPy_copy % pwd                                                     
/Users/electron/Desktop/DJITelloPy_copy
electron@diynoMacBook-Pro DJITelloPy_copy %  python3 keyboard-control-multi_window_including_depth.py
[INFO] tello.py - 107 - Tello instance was initialized. Host: '192.168.10.1'. Port: '8889'.
[INFO] tello.py - 422 - Send command: 'command'
[INFO] tello.py - 446 - Response command: 'ok'
[INFO] tello.py - 422 - Send command: 'streamon'
[INFO] tello.py - 446 - Response streamon: 'ok'
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] non-existing PPS 0 referenced
[h264 @ 0x7fbf00009400] decode_slice_header error
[h264 @ 0x7fbf00009400] no frame!
[h264 @ 0x7fbefc653400] error while decoding MB 10 42, bytestream -5
[h264 @ 0x7fbefc653400] error while decoding MB 15 42, bytestream -7
[h264 @ 0x7fbefc653a00] error while decoding MB 57 37, bytestream -6
[h264 @ 0x7fbefc5e1e00] left block unavailable for requested intra mode
[h264 @ 0x7fbefc5e1e00] error while decoding MB 0 35, bytestream 1545
-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder

0.1秒以内に操作コマンドを入力して下さい :
操作コマンド入力時間切れ。次のフレーム画像を読み込みます。

-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder

0.1秒以内に操作コマンドを入力して下さい :
操作コマンド入力時間切れ。次のフレーム画像を読み込みます。

-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder

0.1秒以内に操作コマンドを入力して下さい :
操作コマンド：　i を受信しました。

[INFO] tello.py - 422 - Send command: 'off'
g
g
[INFO] tello.py - 446 - Response off: 'ok'
-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder

0.1秒以内に操作コマンドを入力して下さい :
操作コマンド：　g を受信しました。

[INFO] tello.py - 422 - Send command: 'land'
[INFO] tello.py - 446 - Response land: 'ok'
-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder

0.1秒以内に操作コマンドを入力して下さい :
操作コマンド：　g を受信しました。

作成したスクリプト・ファイル

keyboard-control-multi_window_including_depth.py

from __future__ import absolute_import, division, print_function
from timeout_decorator import timeout, TimeoutError
from djitellopy import Tello
import cv2, math, time
import os
import sys
import glob
import argparse
import numpy as np
import PIL.Image as pil
import matplotlib as mpl
import matplotlib.cm as cm
import torch
from torchvision import transforms, datasets

# 以下、MonoDepth2モデルを利用。このスクリプトファイルの格納先ディレクトリ直下に、networksディレクトリ等の資源の配置が必要
import networks
from layers import disp_to_depth
from utils import download_model_if_doesnt_exist
from evaluate_depth import STEREO_SCALE_FACTOR

model_name = "mono+stereo_640x192"

def mono_depth2(image):
    if torch.cuda.is_available() and not args.no_cuda:
        device = torch.device("cuda")
    else:
        device = torch.device("cpu")

    download_model_if_doesnt_exist(model_name)
    model_path = os.path.join("models", model_name)
    print("-> Loading model from ", model_path)
    encoder_path = os.path.join(model_path, "encoder.pth")
    depth_decoder_path = os.path.join(model_path, "depth.pth")

    # LOADING PRETRAINED MODEL
    print("   Loading pretrained encoder")
    encoder = networks.ResnetEncoder(18, False)
    loaded_dict_enc = torch.load(encoder_path, map_location=device)

    # extract the height and width of image that this model was trained with
    feed_height = loaded_dict_enc['height']
    feed_width = loaded_dict_enc['width']
    filtered_dict_enc = {k: v for k, v in loaded_dict_enc.items() if k in encoder.state_dict()}
    encoder.load_state_dict(filtered_dict_enc)
    encoder.to(device)
    encoder.eval()

    print("   Loading pretrained decoder")
    depth_decoder = networks.DepthDecoder(
        num_ch_enc=encoder.num_ch_enc, scales=range(4))

    loaded_dict = torch.load(depth_decoder_path, map_location=device)
    depth_decoder.load_state_dict(loaded_dict)

    depth_decoder.to(device)
    depth_decoder.eval()

    # PREDICTING ON EACH IMAGE IN TURN
    with torch.no_grad():
        # Load image and preprocess
        #input_image = pil.open(image).convert('RGB')
        #https://imagingsolution.net/program/python/numpy/python_numpy_pillow_image_convert/
        input_image = pil.fromarray(image)
        original_width, original_height = input_image.size
        input_image = input_image.resize((feed_width, feed_height), pil.LANCZOS)
        input_image = transforms.ToTensor()(input_image).unsqueeze(0)
        
        # PREDICTION
        input_image = input_image.to(device)
        features = encoder(input_image)
        outputs = depth_decoder(features)
        
        disp = outputs[("disp", 0)]
        disp_resized = torch.nn.functional.interpolate(
        disp, (original_height, original_width), mode="bilinear", align_corners=False)
            
        # return colormapped depth image
        disp_resized_np = disp_resized.squeeze().cpu().numpy()
        vmax = np.percentile(disp_resized_np, 95)
        normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)
        mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
        colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)
        #im = pil.fromarray(colormapped_im)
        
        return colormapped_im
        #以下は、このメソッドの返り値の受け取り元のコードで行う
        #ウィンドウ出力
        #cv2.imshow("MonoDepth2", im)
# 以上

TIMEOUT_SEC = 0.1

@timeout(TIMEOUT_SEC)
def input_with_timeout(msg=None):
   return input(msg)


tello = Tello()
tello.connect()

tello.streamon()
frame_read = tello.get_frame_read()

# tello.takeoff()

while True:
    # In reality you want to display frames in a seperate thread. Otherwise
    #  they will freeze while the drone moves.
    img = frame_read.frame
    cv2.imshow("drone", img)
    cv2.imshow('Canny', cv2.Canny(img, 100, 200))
    bitwised_img = cv2.bitwise_not(img)
    cv2.imshow('Bitwised', bitwised_img)
    
    #MonoDepth2モデル
    depth_image = mono_depth2(img)
    cv2.imshow('Depth', depth_image)
    #次の行（key = cv2.・・・）を削除すると、画像が受信できなくなる。
    key = cv2.waitKey(1) & 0xff
    
    try:
        msg = input_with_timeout('\n{}秒以内に操作コマンドを入力して下さい :'.format(TIMEOUT_SEC))
        print('\n操作コマンド：　{} を受信しました。\n'.format(msg))
        if msg == "i":
            tello.takeoff()
        elif msg == "w":
            tello.move_forward(30)
        elif msg == "s":
            tello.move_back(30)
        elif msg == "a":
            tello.move_left(30)
        elif msg == "d":
            tello.move_right(30)
        elif msg == "e":
            tello.rotate_clockwise(30)
        elif msg == "q":
            tello.rotate_counter_clockwise(30)
        elif msg == "r":
            tello.move_up(30)
        elif msg == "f":
            tello.move_down(30)
        elif msg == "g":
            tello.land()
    except TimeoutError:
        print('\n操作コマンド入力時間切れ。次のフレーム画像を読み込みます。\n')

tello.land()

ディレクトリ構成

今回、「１」のリポジトリからgit cloneしたディレクトリに作成した（過去の記事で作成済みの）スクリプトファイルに、「２」をgit cloneして取得した__simple_test.py__のコードを、適宜書き変えて、移植しました。

__DJITelloPy__のGitHub公式リポジトリの資材一式
__MonoDepth2__のGitHub公式リポジトリの資材一式

__simple_test.py__は、冒頭の部分で、git cloneしてローカルに取ってきた「2」の資材を読み込んでいます。

#以下、MonoDepth2モデルを利用。このスクリプトファイルの格納先ディレクトリ直下に、networksディレクトリ等の資源の配置が必要
import networks
from layers import disp_to_depth
from utils import download_model_if_doesnt_exist

そのため、「2」のすべてのファイルを、「１」のディレクトリの中にコピーしました。

terminal

/Users/electron/Desktop/DJITelloPy_copy

terminal

electron@diynoMacBook-Pro DJITelloPy_copy % tree
.
├── LICENSE
├── LICENSE.txt
├── README.md
├── __pycache__
│   ├── evaluate_depth.cpython-39.pyc
│   ├── kitti_utils.cpython-39.pyc
│   ├── layers.cpython-39.pyc
│   ├── options.cpython-39.pyc
│   └── utils.cpython-39.pyc
├── assets
│   ├── copyright_notice.txt
│   ├── mountain.jpg
│   ├── mountain_disp.jpeg
│   ├── mountain_disp.npy
│   ├── takeoff.jpg
│   ├── takeoff_disp.jpeg
│   ├── takeoff_disp.npy
│   ├── teaser.gif
│   ├── test_image.jpg
│   ├── test_image_disp.jpeg
│   └── test_image_disp.npy
├── datasets
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── kitti_dataset.cpython-39.pyc
│   │   └── mono_dataset.cpython-39.pyc
│   ├── kitti_dataset.py
│   └── mono_dataset.py
├── depth_prediction_example.ipynb
├── djitellopy
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── enforce_types.cpython-39.pyc
│   │   ├── swarm.cpython-39.pyc
│   │   └── tello.cpython-39.pyc
│   ├── enforce_types.py
│   ├── swarm.py
│   └── tello.py
├── doc-requirements.txt
├── docs
│   ├── index.md
│   ├── swarm.md
│   └── tello.md
├── evaluate_depth.py
├── evaluate_pose.py
├── examples
│   ├── keyboard-control-movie.py
│   ├── keyboard-control-multi_window.py
│   ├── keyboard-control-multi_window_2_Aug6.py
│   ├── keyboard-control-multi_window_input_text.py
│   ├── keyboard-control-multi_window_input_text_2.py
│   ├── manual-control-opencv.py
│   ├── manual-control-opencv_2.py
│   ├── manual-control-opencv_3.py
│   ├── manual-control-pygame.py
│   ├── mission-pads.py
│   ├── record-video.py
│   ├── simple-swarm.py
│   ├── simple.py
│   └── take-picture.py
├── experiments
│   ├── mono+stereo_experiments.sh
│   ├── mono_experiments.sh
│   ├── odom_experiments.sh
│   └── stereo_experiments.sh
├── export_gt_depth.py
├── keyboard-control-multi_window_including_depth.py
├── kitti_utils.py
├── layers.py
├── mkdocs.yml
├── models
│   ├── mono+stereo_640x192
│   │   ├── depth.pth
│   │   ├── encoder.pth
│   │   ├── pose.pth
│   │   ├── pose_encoder.pth
│   │   └── poses.npy
│   └── mono+stereo_640x192.zip
├── networks
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── depth_decoder.cpython-39.pyc
│   │   ├── pose_cnn.cpython-39.pyc
│   │   ├── pose_decoder.cpython-39.pyc
│   │   └── resnet_encoder.cpython-39.pyc
│   ├── depth_decoder.py
│   ├── pose_cnn.py
│   ├── pose_decoder.py
│   └── resnet_encoder.py
├── options.py
├── requirements.txt
├── setup.cfg
├── setup.py
├── splits
│   ├── benchmark
│   │   ├── eigen_to_benchmark_ids.npy
│   │   ├── test_files.txt
│   │   ├── train_files.txt
│   │   └── val_files.txt
│   ├── eigen
│   │   └── test_files.txt
│   ├── eigen_benchmark
│   │   └── test_files.txt
│   ├── eigen_full
│   │   ├── train_files.txt
│   │   └── val_files.txt
│   ├── eigen_zhou
│   │   ├── train_files.txt
│   │   └── val_files.txt
│   ├── kitti_archives_to_download.txt
│   └── odom
│       ├── test_files_09.txt
│       ├── test_files_10.txt
│       ├── train_files.txt
│       └── val_files.txt
├── test_simple.py
├── train.py
├── trainer.py
└── utils.py

20 directories, 102 files
electron@diynoMacBook-Pro DJITelloPy_copy %

コードの修正ポイント

MonoDepth2のGitHubリポジトリのサンプルスクリプト__test_simple.py__のコードの次の部分を変えて、次の記事で作成済みのDJIDronPyを使って、Telloを飛ばすスクリプト・ファイルに移植しました。

（一部改変の上、取り出したコードの取得元）

以下のリポジトリにある__※test_simple.py__ファイルです。

（GitHub） nianticlabs/monodepth2

（移植先のスクリプト・ファイル）

TelloPyのvideo_effect2.pyを改造して、Telloカメラの画像に色々な画像処理をかけた結果をリアルタイムにマルチウィンドウ出力した件

__test_simple.py__ファイルの変更点__の要点は、次の３点です。

（変更点１） Telloの単眼カメラ画像（numpy形式）を引数で受け取るようにメソッド化

コマンドライン引数（argparase）で画像ファイル名を受け取る処理ブロックを削除
引数で受け取ったものがファイルではなく、ディレクトリのときは、複数のファイルを自動読み込みする条件分岐処理のブックを削除

（コメントアウトして無効化）

input_image = pil.open(image).convert('RGB')

ファイルをPILLOWモジュールのopenメソッドで受け取る処理を、コメントアウト
メソッドの引数image（以下）で受け取るオブジェクトに変える

def mono_depth2(image):
   if torch.cuda.is_available() and not args.no_cuda:
       device = torch.device("cuda")
   else:
       device = torch.device("cpu")

そして、Numpy形式のimageオブジェクトを、PIL.Imageのfromarray()メソッドに渡して、PILで扱えるオブジェクトにデータ型変換を行いました。

この辺りは、次のサイトを参考にしました。

【Python】画像データ(NumPy,Pillow(PIL))の相互変換

import PIL.Image as pil

#https://imagingsolution.net/program/python/numpy/python_numpy_pillow_image_convert/
input_image = pil.fromarray(image)

上のメソッド引数imageは、TelloDJIPy側のスクリプトの次の部分で作られた画像データ

tello.streamon()
frame_read = tello.get_frame_read()

while True:
   img = frame_read.frame

  #MonoDepth2モデル
   depth_image = mono_depth2(img)
   cv2.imshow('Depth', depth_image)

（変更点２） Numpyオブジェクトの深度推定マップ画像データをreturn文で返す

画像をファイル出力する処理を削除

return colormapped_im

上のreturn文が該当部分です。

       # return colormapped depth image
       disp_resized_np = disp_resized.squeeze().cpu().numpy()
       vmax = np.percentile(disp_resized_np, 95)
       normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)
       mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
       colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)
       #im = pil.fromarray(colormapped_im)

       return colormapped_im
       #以下は、このメソッドの返り値の受け取り元のコードで行う
       #ウィンドウ出力
       #cv2.imshow("MonoDepth2", im)

元のコードでは、__#im = pil.fromarray(colormapped_im)__で、中間生成されたNumpyオブジェクトをPILLOWモジュールのオブジェクトに型変換している。ここは、コメントアウトして、実行しない。

（変更点３） Numpyオブジェクトをcv2.imshow()でウィンドウ出力

   #MonoDepth2モデル
   depth_image = mono_depth2(img)
   cv2.imshow('Depth', depth_image)
   #次の行（key = cv2.・・・）を削除すると、画像が受信できなくなる。
   key = cv2.waitKey(1) & 0xff

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up