RubyでYOLOv7を動かしたった

Last updated at 2022-08-09Posted at 2022-07-24

最近の若者はRubyとかやらん。

なので、数学や機械学習のことが全くわからないおっさんがRubyでYOLOv7やってみた記事を書く。
ONNX Runtimeがあるので、推論だけならだいたい何でも動く。

（3年ぐらい前に onnxruntimeでYOLOv3をやった記事を書いたので流用）

こちらを経由して（しなくてもいいが）

こちらのプロジェクトからONNXのモデルをダウンロード。これの管理人が@PINTO さん。

GithubにQiitaへのリンクが貼ってあるQiitaユーザーの鏡。ありがとうございます。

ここで、post-process_merged をつかってポストプロセス実装済みモデルをダウンロードするのがポイント。
じゃないと post-process を自分で実装する必要が出てきて面倒くさい。

git clone https://github.com/PINTO0309/PINTO_model_zoo
cd PINTO_model_zoo/307_YOLOv7
chmod +x download_single_batch_post-process_merged.sh
./download_single_batch_post-process_merged.sh

すると、モデルがダウンロードされる。今回はメインっぽい yolov7_post_640x640.onnx を使う。
Ruby版のonnxruntimeをインストール。

gem install onnxruntime

するだけ。ただしGPU使いたい人は自分で設定する必要がある。

ONNXのモデルをチェックするために、Netronを準備しておく。

Netronでモデルを開いて、inputとoutputがどんな感じかぐらい確認しておくと良いでしょう。

わからん。おっさんにわかるのは、INPUTとOUTPUTの行列のシェイプだけ。

おっさんはめんどうくさがりなので、前回のスクリプトの使い回しで mini_magick使うけど、最近はWatosonさんがメンテナンスしているrmagickもいいらしいです。

実際に動かしてみると、機械学習の推論よりも、mini_magickの方が遅い。他のツールを使った方がいいかもしれない。まぁその気になればWebカメラで、移した画面をRuby/TkやRuby/GTKでリアルタイムに表示する、みたいなこともやれる。やれるけど面倒くさいのでやらない。おっさんになると何もかもが面倒くさいのである。そのうちRubyに対するこだわりすら面倒くさくなってPythonを使い始めるかもしれない。（そう思ってから10年近く経過しているので、たぶんPythonをメインにする日はこない）

yolo.rb

require 'mini_magick'
require 'numo/narray'
require 'onnxruntime'

SFloat = Numo::SFloat

input_path = ARGV[0]
output_path = ARGV[1]

model = OnnxRuntime::Model.new('yolov7_post_640x640.onnx')

labels = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
          'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter',
          'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
          'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
          'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
          'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
          'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',
          'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
          'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
          'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
          'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
          'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

# preprocessing
img = MiniMagick::Image.open(input_path)
img.combine_options do |b|
  b.resize '640x640!'
  b.gravity 'center'
  b.background 'transparent'
# b.extent '640x640' # 面倒なので縦横比も気にしない
end
img_data = SFloat.cast(img.get_pixels)
img_data /= 255.0
image_data = img_data.transpose(2, 0, 1)
                     .expand_dims(0)
                     .to_a # NArray -> Array

# inference
output = model.predict({ images: image_data })

# postprocessing
scores, indices = output.values

# visualization
img = MiniMagick::Image.open(input_path)
img.colorspace 'gray'
scores.zip(indices).each do |score, i|
  cl = i[1] # cl はクラス
  hue = cl * 100 / 80.0
  label = labels[cl]
  score = score[0]
  p "draw box"
  y1 = i[2] * img.height / 640
  x1 = i[3] * img.width / 640
  y2 = i[4] * img.height / 640
  x2 = i[5] * img.width / 640
  img.combine_options do |c|
    c.draw        "rectangle #{x1}, #{y1}, #{x2}, #{y2}"
    c.fill        "hsla(#{hue}%, 20%, 80%, 0.25)"
    c.stroke      "hsla(#{hue}%, 70%, 60%, 1.0)"
    c.strokewidth (score * 3).to_s
  end
  # draw text
  img.combine_options do |c|
    c.draw "text #{x1}, #{y1 - 5} \"#{label}\""
    c.fill 'white'
    c.pointsize 18
  end
end

img.write output_path

これで ruby yolo.rb A.png B.png などとすればいい。途中ちょろっと出てくるNArrayっていうのはRubyでNumPyに相当するやつです。
Wikipediaから拾ってきた画像を、投げる。

↓　こんな感じになる。

YOLOv3の頃よりもさらに精度が向上しているのが見て取れる。
mini_magickの方がyoloよりずっと遅い。そのぐらいyoloが高速。ONNXバインディングがうごく言語なら、どんなマイナーな言語でも同じことができるはず。

この記事は以上です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up