More than 3 years have passed since last update.

はじめに

2D姿勢推定といえばOpenPoseが有名ですが、SOTAは毎年更新されていて、Jetsonに搭載できるような軽量な商用モデルも出てきているようです。アプリを作る上で現状の推論速度が実測でどの程度なのか知りたくなり調べてみました。最近はモデルの演算量が公開されているようですが、最終的にはJetson Nanoでの実測値を確認したいと思います(今回は、サーバを使用)。

付属サンプルコードを使用してクイックに評価する方針なので、公平な比較になっていないかもしれませんが、参考になればと思います。間違い等のご指摘歓迎です。

ベンチマーク条件

COCO val2017の画像すべてをベンチマーク対象モデルで姿勢推定し、合計の時間/枚数(5,000枚)をFPSとします。カメラからの入力を想定して、推論は1プロセスのみでシーケンシャルに実行します(実際にはオーバラップ可能ですが)。

Top-Down方式は、人のBBoxを入力とするため人を検出する時間が含まれていませんが、人の検出と姿勢推定をパイプライン動作できる前提で、姿勢推定の推論時間をもとにFPSが計算されます。一方、Bottom-Up方式は、1画像ごとの推論時間をもとにFPSが計算されることになります。

実行環境

Hardware

Nvidia GTX 1080Ti (12GB)
Intel Core i7 8700K
DDR4-2666 48GB

Software

Ubuntu 18.04
Python 3.6
CUDA 10.2
PyTorch 1.9.0

データセット

COCO val2017

評価モデル

最新の2D姿勢推定モデル(OpenPose含む)8つをベンチマークしました。モデルのハイパーパラメータは、一番軽量と思われるものと精度が良いと思われるものを恣意的に選択しています(軽量なモデルを探したいので、精度が良いモデルは参考程度)。

Top-Down方式

Deep High-Resolution-Representation-Net (CVPR 2019)
- (a) DHRRN pose_hrnet_w32 (256x192) with detected bbox
- (b) DHRRN pose_hrnet_w32 (256x192) with ground truth bbox
- (c) DHRRN pose_hrnet_w48 (384x288) with detected bbox
- (d) DHRRN pose_hrnet_w48 (384x288) with ground truth bbox
Fast Human Pose Estimation (CVPR 2019)
- (e) FHPE pose_hrnet_w32_student_FPD with detected bbox
- (f) FHPE pose_hrnet_w32_student_FPD with ground truth bbox
DarkPose (ICCV 2019)
- (g) DarkPose HRNet-W32 (256x192) with detected bbox
- (h) DarkPose HRNet-W32 (256x192) with ground truth bbox
- (i) DarkPose HRNet-W48 (384x288) with detected bbox
- (j) DarkPose HRNet-W48 (384x288) with ground truth bbox
TransPose (Dec. 2020)
- (k) TransPose R-A3 with detected bbox
- (l) TransPose R-A3 with ground truth bbox
- (m) TransPose H-A6 with detected bbox
- (n) TransPose H-A6 with ground truth bbox
Lite-HRNet (CVPR 2021)
- (o) Naive Lite-HRNet-18 256x192 with detected bbox
- (p) Naive Lite-HRNet-18 256x192 with ground truth bbox
- (q) Lite-HRNet-18 256x192 with detected bbox
- (r) Lite-HRNet-18 256x192 with ground truth bbox

Bottom-Up方式

OpenPose (CVPR 2017)
- (s) OpenPose multi thread disabled
HigherHRNet (CVPR 2020)
- (t) HHRNet w32-512 without multi-scale
- (u) HHRNet w32-512 with multi-scale
HRNet-DEKR (CVPR 2021)
- (v) DEKR pose_hrnet_w32 without multi-scale
- (w) DEKR pose_hrnet_w32 with multi-scale
- (x) DEKR pose_hrnet_w48 without multi-scale
- (y) DEKR pose_hrnet_w48 with multi-scale

結果

APを横軸、FPSを縦軸にグラフをプロットしました。性能を追求していることもあり、FPSに関してはなかなか厳しい結果となりました。特に、Top-Down方式は画像中に映っている人数に比例して演算量が増えてしまうためアプリに合わせた設計が求められそうです。

性能面ではDarkPoseが良く、TransPoseはBBox単位では最速でした。Bottom-Up方式はOpenPose以外はFPSが低く用途が限られそうです。

Top-Down方式

Bottom-Up方式

まとめ

最新の2D姿勢推定モデルをベンチマークしてみました。次回は、精度には目をつぶって推論速度が速いモデルを中心にベンチマークしたり、推論速度の高速化について検討したりしたいと思います。

付録：実行方法メモ

TopDown方式

Deep High-Resolution-Representation-Net

DHRRN pose_hrnet_w32 (256x192) with detected bbox
DHRRN pose_hrnet_w32 (256x192) with ground truth bbox
DHRRN pose_hrnet_w48 (384x288) with detected bbox
DHRRN pose_hrnet_w48 (384x288) with ground truth bbox

$ python3 tools/test.py --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pose_hrnet_w32_256x192.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pose_hrnet_w32_256x192.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pose_hrnet_w48_384x288.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pose_hrnet_w48_384x288.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 GPUS '(0,)'

Fast Human Pose Estimation

FHPE pose_hrnet_w32_student_FPD with detected bbox
FHPE pose_hrnet_w32_student_FPD with ground truth bbox

注意点：

requirements.txtのtorch==1.0をtorch==1.9にします。
data/cache/coco_cached_val2017_db.pkl を削除しないとUSE_GT_BBOXが無視されます。

$ python3 tools/test.py --cfg experiments/fpd_coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pose_hrnet_w32_student_FPD.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 100000 \
WORKERS 4 GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/fpd_coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pose_hrnet_w32_student_FPD.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 100000 \
WORKERS 4 GPUS '(0,)'

DarkPose

DarkPose HRNet-W32 (256x192) with detected bbox
DarkPose HRNet-W32 (256x192) with ground truth bbox
DarkPose HRNet-W48 (384x288) with detected bbox
DarkPose HRNet-W48 (384x288) with ground truth bbox

$ python3 tools/test.py --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/w32_256×192.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 \
GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/w32_256×192.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 \
GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/w48_384×288.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 \
GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/w48_384×288.pth \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 \
GPUS '(0,)'

TransPose

TransPose R-A3 with detected bbox
TransPose R-A3 with ground truth bbox
TransPose H-A6 with detected bbox
TransPose H-A6 with ground truth bbox

$ python3 tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc3_mh8.yaml \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4

$ python3 tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc3_mh8.yaml \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4

$ python3 tools/test.py --cfg experiments/coco/transpose_h/TP_H_w48_256x192_stage3_1_4_d96_h192_relu_enc6_mh1.yaml \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX False \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 \
GPUS '(0,)'

$ python3 tools/test.py --cfg experiments/coco/transpose_h/TP_H_w48_256x192_stage3_1_4_d96_h192_relu_enc6_mh1.yaml \
TEST.FLIP_TEST False \
TEST.USE_GT_BBOX True \
TEST.BATCH_SIZE_PER_GPU 1 \
PRINT_FREQ 1000000 \
WORKERS 4 \
GPUS '(0,)'

Lite-HRNet

Naive Lite-HRNet-18 256x192 with detected bbox
Naive Lite-HRNet-18 256x192 with ground truth bbox
Lite-HRNet-18 256x192 with detected bbox
Lite-HRNet-18 256x192 with ground truth bbox

注意点：

naive_litehrnet_18_coco_256x192.pyのkeypoint_headのin_channlesを40から30に変更します(pre-trainedモデルに一致させるため)。
use_gt_bboxは configファイルで指定します。

$ python3 tools/test.py configs/top_down/naive_litehrnet/coco/naive_litehrnet_18_coco_256x192.py \
pre-trained/naive_litehrnet_18_coco_256x192.pth

$ python3 tools/test.py configs/top_down/lite_hrnet/coco/litehrnet_18_coco_256x192.py \
pre-trained/litehrnet_18_coco_256x192.pth

Bottom-Up方式

OpenPose

$ time ./build/examples/openpose/openpose.bin --image_dir data/coco/images/val2017 \
--display 0 --num_gpu 1 --write_json result.json --render_pose 0 --logging_level 2 --disable_multi_thread

HigherHRNet

HHRNet-w32-512 without multi-scale
HHRNet-w32-512 with multi-scale

$ python3 tools/valid.py --cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pytorch/pose_coco/pose_higher_hrnet_w32_512.pth \
TEST.FLIP_TEST False \
TEST.LOG_PROGRESS True

$ python3 tools/valid.py --cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pytorch/pose_coco/pose_higher_hrnet_w32_512.pth \
TEST.SCALE_FACTOR '[0.5, 1.0, 2.0]' \
TEST.FLIP_TEST False \
TEST.LOG_PROGRESS True

HRNet-DEKR

DEKR pose_hrnet_w32 without multi-scale
DEKR pose_hrnet_w32 with multi-scale
DEKR pose_hrnet_w48 without multi-scale
DEKR pose_hrnet_w48 with multi-scale

$ python3 tools/valid.py --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth \
TEST.LOG_PROGRESS True \
TEST.FLIP_TEST False \
PRINT_FREQ 10000

$ python3 tools/valid.py --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth \
TEST.LOG_PROGRESS True \
TEST.FLIP_TEST False \
TEST.NMS_THRE 0.15 \
TEST.SCALE_FACTOR 0.5,1.0,2.0 \
PRINT_FREQ 10000

$ python3 tools/valid.py --cfg experiments/coco/w48/w48_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw48_coco.pth \
TEST.LOG_PROGRESS True \
TEST.FLIP_TEST False \
PRINT_FREQ 10000

$ python3 tools/valid.py --cfg experiments/coco/w48/w48_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw48_coco.pth \
TEST.LOG_PROGRESS True \
TEST.FLIP_TEST False \
TEST.NMS_THRE 0.15 \
TEST.SCALE_FACTOR 0.5,1.0,2.0 \
PRINT_FREQ 10000

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

最新2D姿勢推定モデルの推論速度をベンチマーク

はじめに

ベンチマーク条件

実行環境

Hardware

Software

データセット

評価モデル

Top-Down方式

Bottom-Up方式

結果

Top-Down方式

Bottom-Up方式

まとめ

付録：実行方法メモ

TopDown方式

Bottom-Up方式