1.Introduction
前回は同じモデルを3つの Coral Edge TPU Accelerator に展開して、見せかけの推論スピードを超速にしてキャーキャー言いながら遊びました。 [[150 FPS ++] Coral Edge TPU Accelerator を3本突き刺して並列推論し超速のPosenet性能を獲得する ー無駄な高性能の極みへー] (https://qiita.com/PINTO/items/e969fa7601d0868e451f)
今回は、3つのTPUに別々のモデルを展開して、撮影と推論を非同期に実行します。 3モデルの同時推論です。 公開したロジックは構造化もクラス化も行っていないとても雑な実装ですが、プログラムファイルは2枚で完結します。 綺麗に実装するなら、基底クラスやインタフェースをちゃんと実装して拡張性を高めるべきです。 また、例外処理もほぼ皆無のクソコードですので、ご利用の際はご自身で整地してください。
下の動画が実際に3モデルを同時に推論させてみた結果です。 わりと軽快にサクサク動きました。 USBカメラの撮影フレームレートを 30 FPS に制限していますので、Posenet と MobileNet-SSD は 約30FPS、DeeplabV3 が 9〜10FPS です。 投資することを厭わない場合は、色々なことができそうです。 なお、プログラムを1行修正するだけで、1つのTPUに複数モデルを割り当てることもできますが今回は割愛します。 RaspberryPi3 で無理矢理実行すると電源電圧が不足します。
Multi-TPU の 3 Model 並列推論の実装が完了しました。 やっつけ実装ですので、有りモノの Posenet + DeeplabV3 + MobileNet-SSD v2 を組み合わせました。 全て非同期で同時に同じINPUTに対して推論します。 やはり、DeeplabV3が他に比べて倍重たいですね。https://t.co/gIxtcmDVOq
— Super PINTO (@PINTO03091) August 25, 2019
2.Environment
- Ubuntu 16.04 x86_64, USB 3.0
- Coral Edge TPU Accelerator x 3本
- Python 3.5.3
- [Model.1] posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite
- [Model.2] deeplabv3_mnv2_dm05_pascal_trainaug_edgetpu.tflite
- [Model.3] mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
- edgetpu runtime 2.11.1
- OpenCV 4.1.1-openvino
- USB Camera (Playstationeye, 640x480, 30 FPS)
- Self-powered USB3.0 HUB
- google-coral/project-posenet の PoseEngine を一部改造
- セマンティックセグメンテーション用のカラーマップ画像 colorpalette.png
PNGファイル内に記録されているカラーパレットのみ抽出してセマンティックセグメンテーションの着色に使用します。 冷たいですが、一家団欒の絵はガン無視です。
3.Implementation
import sys
import argparse
import numpy as np
import cv2
import time
from PIL import Image
from time import sleep
import multiprocessing as mp
from edgetpu.basic import edgetpu_utils
from pose_engine import PoseEngine
from edgetpu.basic.basic_engine import BasicEngine
from edgetpu.detection.engine import DetectionEngine
pose_lastresults = None
deep_lastresults = None
ssd_lastresults = None
processes = []
pose_frameBuffer = None
deep_frameBuffer = None
ssd_frameBuffer = None
pose_results = None
deep_results = None
ssd_results = None
fps = ""
pose_detectfps = ""
deep_detectfps = ""
ssd_detectfps = ""
framecount = 0
pose_detectframecount = 0
deep_detectframecount = 0
ssd_detectframecount = 0
time1 = 0
time2 = 0
box_color = (255, 128, 0)
box_thickness = 1
label_background_color = (125, 175, 75)
label_text_color = (255, 255, 255)
percentage = 0.0
# COCO Labels
SSD_LABELS = ['person','bicycle','car','motorcycle','airplane','bus','train','truck','boat','',
'traffic light','fire hydrant','stop sign','parking meter','bench','bird','cat','dog','horse','sheep',
'cow','elephant','bear','','zebra','giraffe','backpack','umbrella','','',
'handbag','tie','suitcase','frisbee','skis','snowboard','sports ball','kite','baseball bat','baseball glove',
'skateboard','surfboard','tennis racket','bottle','','wine glass','cup','fork','knife','spoon',
'bowl','banana','apple','sandwich','orange','broccoli','carrot','hot dog','pizza','donut',
'cake','chair','couch','potted plant','bed','','dining table','','','toilet',
'','tv','laptop','mouse','remote','keyboard','cell phone','microwave','oven','toaster',
'sink','refrigerator','','book','clock','vase','scissors','teddy bear','hair drier','toothbrush']
# Deeplab color palettes
DEEPLAB_PALETTE = Image.open("colorpalette.png").getpalette()
# Posenet Edges
EDGES = (
('nose', 'left eye'),
('nose', 'right eye'),
('nose', 'left ear'),
('nose', 'right ear'),
('left ear', 'left eye'),
('right ear', 'right eye'),
('left eye', 'right eye'),
('left shoulder', 'right shoulder'),
('left shoulder', 'left elbow'),
('left shoulder', 'left hip'),
('right shoulder', 'right elbow'),
('right shoulder', 'right hip'),
('left elbow', 'left wrist'),
('right elbow', 'right wrist'),
('left hip', 'right hip'),
('left hip', 'left knee'),
('right hip', 'right knee'),
('left knee', 'left ankle'),
('right knee', 'right ankle'),
)
def camThread(pose_results, deep_results, ssd_results,
pose_frameBuffer, deep_frameBuffer, ssd_frameBuffer,
camera_width, camera_height, vidfps, usbcamno, videofile):
global fps
global pose_detectfps
global deep_detectfps
global ssd_detectfps
global framecount
global pose_detectframecount
global deep_detectframecount
global ssd_detectframecount
global time1
global time2
global pose_lastresults
global deep_lastresults
global ssd_lastresults
global cam
global window_name
global waittime
if videofile == "":
cam = cv2.VideoCapture(usbcamno)
cam.set(cv2.CAP_PROP_FPS, vidfps)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, camera_width)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, camera_height)
waittime = 1
window_name = "USB Camera"
else:
cam = cv2.VideoCapture(videofile)
waittime = vidfps
window_name = "Movie File"
cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE)
while True:
t1 = time.perf_counter()
ret, color_image = cam.read()
if not ret:
continue
if pose_frameBuffer.full():
pose_frameBuffer.get()
if deep_frameBuffer.full():
deep_frameBuffer.get()
if ssd_frameBuffer.full():
ssd_frameBuffer.get()
frames = cv2.resize(color_image, (camera_width, camera_height)).copy()
pose_frameBuffer.put(cv2.resize(color_image, (640, 480)).copy())
deep_frameBuffer.put(cv2.resize(color_image, (513, 513)).copy())
ssd_frameBuffer.put(cv2.resize(color_image, (640, 480)).copy())
res = None
# Posenet
if not pose_results.empty():
res = pose_results.get(False)
pose_detectframecount += 1
imdraw = pose_overlay_on_image(frames, res)
pose_lastresults = res
else:
imdraw = pose_overlay_on_image(frames, pose_lastresults)
# MobileNet-SSD
if not ssd_results.empty():
res = ssd_results.get(False)
ssd_detectframecount += 1
imdraw = ssd_overlay_on_image(imdraw, res)
ssd_lastresults = res
else:
imdraw = ssd_overlay_on_image(imdraw, ssd_lastresults)
# Deeplabv3
if not deep_results.empty():
res = deep_results.get(False)
deep_detectframecount += 1
imdraw = deep_overlay_on_image(imdraw, res, camera_width, camera_height)
deep_lastresults = res
else:
imdraw = deep_overlay_on_image(imdraw, deep_lastresults, camera_width, camera_height)
cv2.putText(imdraw, fps, (camera_width-170,15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.putText(imdraw, pose_detectfps, (camera_width-170,30), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.putText(imdraw, deep_detectfps, (camera_width-170,45), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.putText(imdraw, ssd_detectfps, (camera_width-170,60), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (38,0,255), 1, cv2.LINE_AA)
cv2.imshow(window_name, imdraw)
if cv2.waitKey(waittime)&0xFF == ord('q'):
break
# FPS calculation
framecount += 1
# Posenet, DeeplabV3, MobileNet-SSD v2
if framecount >= 15:
fps = "(Playback) {:.1f} FPS".format(time1/15)
pose_detectfps = "(Posenet) {:.1f} FPS".format(pose_detectframecount/time2)
deep_detectfps = "(Deeplab) {:.1f} FPS".format(deep_detectframecount/time2)
ssd_detectfps = "(SSD) {:.1f} FPS".format(ssd_detectframecount/time2)
framecount = 0
pose_detectframecount = 0
deep_detectframecount = 0
ssd_detectframecount = 0
time1 = 0
time2 = 0
t2 = time.perf_counter()
elapsedTime = t2-t1
time1 += 1/elapsedTime
time2 += elapsedTime
def pose_inferencer(results, frameBuffer, model, device):
pose_engine = None
pose_engine = PoseEngine(model, device)
print("Loaded Graphs!!! (Posenet)")
while True:
if frameBuffer.empty():
continue
# Run inference.
color_image = frameBuffer.get()
prepimg_pose = color_image[:, :, ::-1].copy()
tinf = time.perf_counter()
result_pose, inference_time = pose_engine.DetectPosesInImage(prepimg_pose)
print(time.perf_counter() - tinf, "sec (Posenet)")
results.put(result_pose)
def deep_inferencer(results, frameBuffer, model, device):
deep_engine = None
deep_engine = BasicEngine(model, device)
print("Loaded Graphs!!! (Deeplab)")
while True:
if frameBuffer.empty():
continue
# Run inference.
color_image = frameBuffer.get()
prepimg_deep = color_image[:, :, ::-1].copy()
prepimg_deep = prepimg_deep.flatten()
tinf = time.perf_counter()
latency, result_deep = deep_engine.RunInference(prepimg_deep)
print(time.perf_counter() - tinf, "sec (Deeplab)")
results.put(result_deep)
def ssd_inferencer(results, frameBuffer, model, device):
ssd_engine = None
ssd_engine = DetectionEngine(model, device)
print("Loaded Graphs!!! (SSD)")
while True:
if frameBuffer.empty():
continue
# Run inference.
color_image = frameBuffer.get()
prepimg_ssd = color_image[:, :, ::-1].copy()
prepimg_ssd = Image.fromarray(prepimg_ssd)
tinf = time.perf_counter()
result_ssd = ssd_engine.DetectWithImage(prepimg_ssd, threshold=0.5, keep_aspect_ratio=True, relative_coord=False, top_k=10)
print(time.perf_counter() - tinf, "sec (SSD)")
results.put(result_ssd)
def draw_pose(img, pose, threshold=0.2):
xys = {}
for label, keypoint in pose.keypoints.items():
if keypoint.score < threshold: continue
xys[label] = (int(keypoint.yx[1]), int(keypoint.yx[0]))
img = cv2.circle(img, (int(keypoint.yx[1]), int(keypoint.yx[0])), 5, (0, 255, 0), -1)
for a, b in EDGES:
if a not in xys or b not in xys: continue
ax, ay = xys[a]
bx, by = xys[b]
img = cv2.line(img, (ax, ay), (bx, by), (0, 255, 255), 2)
def pose_overlay_on_image(frames, result):
color_image = frames
if isinstance(result, type(None)):
return color_image
img_cp = color_image.copy()
for pose in result:
draw_pose(img_cp, pose)
return img_cp
def deep_overlay_on_image(frames, result, width, height):
color_image = frames
if isinstance(result, type(None)):
return color_image
img_cp = color_image.copy()
outputimg = np.reshape(np.uint8(result), (513, 513))
outputimg = cv2.resize(outputimg, (width, height))
outputimg = Image.fromarray(outputimg, mode="P")
outputimg.putpalette(DEEPLAB_PALETTE)
outputimg = outputimg.convert("RGB")
outputimg = np.asarray(outputimg)
outputimg = cv2.cvtColor(outputimg, cv2.COLOR_RGB2BGR)
img_cp = cv2.addWeighted(img_cp, 1.0, outputimg, 0.9, 0)
return img_cp
def ssd_overlay_on_image(frames, result):
color_image = frames
if isinstance(result, type(None)):
return color_image
img_cp = color_image.copy()
for obj in result:
box = obj.bounding_box.flatten().tolist()
box_left = int(box[0])
box_top = int(box[1])
box_right = int(box[2])
box_bottom = int(box[3])
cv2.rectangle(img_cp, (box_left, box_top), (box_right, box_bottom), box_color, box_thickness)
percentage = int(obj.score * 100)
label_text = SSD_LABELS[obj.label_id] + " (" + str(percentage) + "%)"
label_size = cv2.getTextSize(label_text, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)[0]
label_left = box_left
label_top = box_top - label_size[1]
if (label_top < 1):
label_top = 1
label_right = label_left + label_size[0]
label_bottom = label_top + label_size[1]
cv2.rectangle(img_cp, (label_left - 1, label_top - 1), (label_right + 1, label_bottom + 1), label_background_color, -1)
cv2.putText(img_cp, label_text, (label_left, label_bottom), cv2.FONT_HERSHEY_SIMPLEX, 0.5, label_text_color, 1)
return img_cp
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--pose_model", default="models/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite", help="Path of the posenet model.")
parser.add_argument("--deep_model", default="models/deeplabv3_mnv2_dm05_pascal_trainaug_edgetpu.tflite", help="Path of the deeplabv3 model.")
parser.add_argument("--ssd_model", default="models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite", help="Path of the mobilenet-ssd model.")
parser.add_argument("--usbcamno", type=int, default=0, help="USB Camera number.")
parser.add_argument('--videofile', default="", help='Path to input video file. (Default="")')
parser.add_argument('--vidfps', type=int, default=30, help='FPS of Video. (Default=30)')
parser.add_argument('--camera_width', type=int, default=640, help='USB Camera resolution (width). (Default=640)')
parser.add_argument('--camera_height', type=int, default=480, help='USB Camera resolution (height). (Default=480)')
args = parser.parse_args()
pose_model = args.pose_model
deep_model = args.deep_model
ssd_model = args.ssd_model
usbcamno = args.usbcamno
vidfps = args.vidfps
videofile = args.videofile
camera_width = args.camera_width
camera_height = args.camera_height
try:
mp.set_start_method('forkserver')
pose_frameBuffer = mp.Queue(1)
deep_frameBuffer = mp.Queue(1)
ssd_frameBuffer = mp.Queue(1)
pose_results = mp.Queue()
deep_results = mp.Queue()
ssd_results = mp.Queue()
# Start streaming
p = mp.Process(target=camThread,
args=(pose_results, deep_results, ssd_results,
pose_frameBuffer, deep_frameBuffer, ssd_frameBuffer,
camera_width, camera_height, vidfps, usbcamno, videofile),
daemon=True)
p.start()
processes.append(p)
# Activation of inferencer
devices = edgetpu_utils.ListEdgeTpuPaths(edgetpu_utils.EDGE_TPU_STATE_UNASSIGNED)
print(devices)
# Posenet
if len(devices) >= 1:
p = mp.Process(target=pose_inferencer,
args=(pose_results, pose_frameBuffer, pose_model, devices[0]),
daemon=True)
p.start()
processes.append(p)
# DeeplabV3
if len(devices) >= 2:
p = mp.Process(target=ssd_inferencer,
args=(ssd_results, ssd_frameBuffer, ssd_model, devices[1]),
daemon=True)
p.start()
processes.append(p)
# MobileNet-SSD v2
if len(devices) >= 3:
p = mp.Process(target=deep_inferencer,
args=(deep_results, deep_frameBuffer, deep_model, devices[2]),
daemon=True)
p.start()
processes.append(p)
while True:
sleep(1)
finally:
for p in range(len(processes)):
processes[p].terminate()
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import math
import numpy as np
from pkg_resources import parse_version
from edgetpu import __version__ as edgetpu_version
assert parse_version(edgetpu_version) >= parse_version('2.11.1'), \
'This demo requires Edge TPU version >= 2.11.1'
from edgetpu.basic.basic_engine import BasicEngine
from edgetpu.utils import image_processing
from PIL import Image
KEYPOINTS = (
'nose',
'left eye',
'right eye',
'left ear',
'right ear',
'left shoulder',
'right shoulder',
'left elbow',
'right elbow',
'left wrist',
'right wrist',
'left hip',
'right hip',
'left knee',
'right knee',
'left ankle',
'right ankle'
)
class Keypoint:
__slots__ = ['k', 'yx', 'score']
def __init__(self, k, yx, score=None):
self.k = k
self.yx = yx
self.score = score
def __repr__(self):
return 'Keypoint(<{}>, {}, {})'.format(KEYPOINTS[self.k], self.yx, self.score)
class Pose:
__slots__ = ['keypoints', 'score']
def __init__(self, keypoints, score=None):
assert len(keypoints) == len(KEYPOINTS)
self.keypoints = keypoints
self.score = score
def __repr__(self):
return 'Pose({}, {})'.format(self.keypoints, self.score)
class PoseEngine(BasicEngine):
"""Engine used for pose tasks."""
def __init__(self, model_path, device, mirror=False):
"""Creates a PoseEngine with given model.
Args:
model_path: String, path to TF-Lite Flatbuffer file.
mirror: Flip keypoints horizontally
Raises:
ValueError: An error occurred when model output is invalid.
"""
BasicEngine.__init__(self, model_path, device)
self._mirror = mirror
self._input_tensor_shape = self.get_input_tensor_shape()
if (self._input_tensor_shape.size != 4 or
self._input_tensor_shape[3] != 3 or
self._input_tensor_shape[0] != 1):
raise ValueError(
('Image model should have input shape [1, height, width, 3]!'
' This model has {}.'.format(self._input_tensor_shape)))
_, self.image_height, self.image_width, self.image_depth = self.get_input_tensor_shape()
# The API returns all the output tensors flattened and concatenated. We
# have to figure out the boundaries from the tensor shapes & sizes.
offset = 0
self._output_offsets = [0]
for size in self.get_all_output_tensors_sizes():
offset += size
self._output_offsets.append(offset)
def DetectPosesInImage(self, img):
"""Detects poses in a given image.
For ideal results make sure the image fed to this function is close to the
expected input size - it is the caller's responsibility to resize the
image accordingly.
Args:
img: numpy array containing image
"""
# Extend or crop the input to match the input shape of the network.
if img.shape[0] < self.image_height or img.shape[1] < self.image_width:
img = np.pad(img, [[0, max(0, self.image_height - img.shape[0])],
[0, max(0, self.image_width - img.shape[1])], [0, 0]],
mode='constant')
img = img[0:self.image_height, 0:self.image_width]
assert (img.shape == tuple(self._input_tensor_shape[1:]))
# Run the inference (API expects the data to be flattened)
inference_time, output = self.RunInference(img.flatten())
outputs = [output[i:j] for i, j in zip(self._output_offsets, self._output_offsets[1:])]
keypoints = outputs[0].reshape(-1, len(KEYPOINTS), 2)
keypoint_scores = outputs[1].reshape(-1, len(KEYPOINTS))
pose_scores = outputs[2]
nposes = int(outputs[3][0])
assert nposes < outputs[0].shape[0]
# Convert the poses to a friendlier format of keypoints with associated
# scores.
poses = []
for pose_i in range(nposes):
keypoint_dict = {}
for point_i, point in enumerate(keypoints[pose_i]):
keypoint = Keypoint(KEYPOINTS[point_i], point,
keypoint_scores[pose_i, point_i])
if self._mirror: keypoint.yx[1] = self.image_width - keypoint.yx[1]
keypoint_dict[KEYPOINTS[point_i]] = keypoint
poses.append(Pose(keypoint_dict, pose_scores[pose_i]))
return poses, inference_time
usage: ssd-deeplab-posenet.py [-h] [--pose_model POSE_MODEL]
[--deep_model DEEP_MODEL]
[--ssd_model SSD_MODEL] [--usbcamno USBCAMNO]
[--videofile VIDEOFILE] [--vidfps VIDFPS]
[--camera_width CAMERA_WIDTH]
[--camera_height CAMERA_HEIGHT]
optional arguments:
-h, --help show this help message and exit
--pose_model POSE_MODEL
Path of the posenet model.
--deep_model DEEP_MODEL
Path of the deeplabv3 model.
--ssd_model SSD_MODEL
Path of the mobilenet-ssd model.
--usbcamno USBCAMNO USB Camera number.
--videofile VIDEOFILE
Path to input video file. (Default="")
--vidfps VIDFPS FPS of Video. (Default=30)
--camera_width CAMERA_WIDTH
USB Camera resolution (width). (Default=640)
--camera_height CAMERA_HEIGHT
USB Camera resolution (height). (Default=480)
4.Finally
次回はこちら PINTO0309/MobileNet-SSD-RealSense - (Example5) To prevent thermal runaway, simple clustering function (2 Stick = 1 Cluster) の実装を流用して、Edge TPU Accelerator をマルチクラスタ化して発熱耐性を高める改善を行ってみたいと思います。