YOLO V11をonnx形式で使う

Posted at 2024-11-24

基本的な流れ

モデルの準備
前処理
推論
後処理

モデルの準備

pytorch形式で保存したモデルをonnxモデルにエクスポートする必要がある．
path/to/weight.pt部分は重みが保存されているパスに適宜変更する．
dynamic=Trueを行うにはpip install onnxslimをしておく必要がある．

from ultralytics import YOLO
model = YOLO("path/to/weight.pt")
model.export(format="onnx", imgsz=640, optimize=True, dynamic=False)

前処理

モデルの入力形式は以下の通りである．
左から順に，バッチ数，チャンネル数，高さ，幅となる．

input : (N, 3, 640, 640)

基本的に以下の処理をすれば，モデルに入力できる形式となる．

チャンネル変換
リサイズ
正規化
行列の転置
バッチサイズの値の付加

以下がそのサンプリングコードである．
cv2.imread("path/to/img")について，画像のパスを正しく設定する必要がある．動画の読み込みでも問題ない．

なお，今回は元のアスペクト比を維持するレターボックス加工ではないため，必要であればリサイズ処理部分でレターボックス加工を行えばよい．

image = cv2.imread("path/to/img")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # チャンネル変換
image_height, image_width = image.shape[:2] # 後処理で使う
image = cv2.resize(image, dsize=(640, 640)) # リサイズ
input_image = image / 255 # 正規化
input_image = input_image.transpose(2, 0, 1) # 行列の転置 (height, width, channel) -> (channel, height, width)
input_image = np.expand_dims(input_image, axis=0).astype("float32") # (channel, height, width) -> (batch, channel, height, width)

推論

以下のコードで推論できる
何度も推論するなら，yolo_sessの初期化をはじめに行えばよい．

yolo_sess = onnxruntime.InferenceSession("path/to/weight.onnx")
yolo_input_name = self.yolo_sess.get_inputs()[0].name
yolo_output_name = self.yolo_sess.get_outputs()[0].name
yolo_result = self.yolo_sess.run([yolo_output_name], {yolo_input_name : input_image})[0]

後処理

モデルの出力形式は以下の通りである．
左から順に，バッチ数，検出結果，検出数である．

output : (N, 4 + number of class, 8400)

検出結果について，次の順番で格納されている．
バウンディングボックスの中心・バウンディングボックスの幅高さ・クラス確率である．
クラス確率はそのクラスである確率を示し，クラスの数だけ用意される．

[x_center, y_center, width, height, class0_score, class1_score,,, classN_score]

今回はクラス数が1，検出すべき物体数が1であるとし，bboxの左上と右下の座標を返すようにする．
基本的に以下の処理をすれば，上記のことができるようになる．

行列の転置
bboxの座標変換とリサイズ
NMS処理
bboxの座標変換

以下がそのサンプルコードである．

# 行列の転置
yolo_result = yolo_result[0]
yolo_result = yolo_result.transpose(1, 0)
yolo_result = np.ascontiguousarray(yolo_result)

# bboxの座標変換とリサイズ
x_center, y_center, width, height, score = yolo_result[:, 0], yolo_result[:, 1], yolo_result[:, 2], yolo_result[:, 3], yolo_result[:, 4]

x_min = (x_center - width / 2) * x_factor
y_min = (y_center - height / 2) * y_factor
width = width * x_factor
height = height * y_factor

bbox = np.stack((x_min, y_min, width, height), axis=1)

nms_indice = cv2.dnn.NMSBoxes(bbox.tolist(), score.ravel().tolist(), 0.5, 0.5, top_k=1)

if len(nms_indice) != 0:
    bbox = bbox[nms_indice[0]]
    score = score[nms_indice[0]]
    x_min = int(bbox[0])
    y_min = int(bbox[1])
    x_max = int(bbox[0] + bbox[2])
    y_max = int(bbox[1] + bbox[3])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up