M5Stack Module LLM Advent Calendar 2024

Module-LLMのNPU用モデルへ変換する(YOLOv9編)

Last updated at 2024-12-19Posted at 2024-12-19

目的

　Module-LLMのNPUでYOLOv9のモデルを実行する手順を説明します。
　Module-LLMのNPUでモデルを高速に実行するには、Pulsar2というツールを使用してINT8形式に量子化してモデルサイズを縮小する必要があります。

YOLOV9とは

YOLOV9は、ChienYunWangとI-Hau Yehによって開発されました。YOLOシリーズの生みの親であるJoseph Redmonではありません。YOLOV9の特徴として、「Programmable Gradient Information」(PGI)と「Generalized Efficient Layer Aggregation Network」(GELAN)という新しい設計を採用しています。これにより、従来のニューラルネットワークのバックプロパゲーションを改良して効率的な学習を可能することと、ネットワーク内の情報フローの最適化がなされ、より効率的な特徴抽出が可能になりました。

YOLOV9のモデルについて

YOLOV9のモデルの実装にはChienYunWangらが公開しているオリジナルのモデルと、Ultralytics社が公開しているモデルの２つがあります。

YOLOV9(オリジナルモデル)のONNX出力

ChienYunWangらが公開しているYOLOV9のオリジナルのモデルを、AXERA-TECHがpulsar2の変換のために、精度に影響を与えない範囲で若干の修正を加えたリポジトリがあります。これでONNXモデルへエクスポートします。

$ git clone https://github.com/AXERA-TECH/yolov9.git
$ cd yolov9
$ pip install -r requirements.txt
$ wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c.pt
$ python export.py --weights yolov9-c.pt --include onnx
$ onnxsim yolov9-c.onnx yolov9-c.onnx

この手順でyolov9-c.onnxモデルを取得できます。

onnxsimコマンドで、yolov9-s.onnxモデルに対して必要な計算グラフの最適化を行います。
これにより、モデル展開の効率を向上させることができます。最適化後にyolov9-s-cut.onnxが生成されます。

$ python yolov9-cut.py

yolov9-cut.py

import onnx
input_path = "yolov9-c.onnx"
output_path = "yolov9-c-cut.onnx"
input_names = ["images"]
output_names = ["/model.38/Concat_output_0", "/model.38/Concat_1_output_0", "/model.38/Concat_2_output_0"]
onnx.utils.extract_model(input_path, output_path, input_names, output_names)

以上の手順により、YOLOv9をONNX形式に変換し、必要に応じて最適化することができました。

YOLOV9(Ultralyticsモデル)のONNX出力

PytorchとUltralytics をインストールします。ここでは、CPU環境のPytorchをインストールしています。

$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
$ pip install ultralytics

ultralyticsのPythonスクリプトから、yolov9のモデルをダウンロードします。

$ python yolov9_download.py

yolov9_download.py

from ultralytics import YOLO
import os
os.chdir('./model')

# Load a model,Export to onnx with simplify
model = YOLO("yolov9s.pt")
model.info()
model.export(format='onnx', simplify=True,opset=17)

$ python yolov9_ultralytics_cut-onnx.py

python yolov9_cut-onnx.py

import onnx
import os
def extract_onnx_model(input_path, output_path):
   input_names = ["images"]
   output_names = [
       "/model.22/Concat_output_0",
       "/model.22/Concat_1_output_0", 
       "/model.22/Concat_2_output_0"
   ]
   onnx.utils.extract_model(input_path, output_path, input_names, output_names)

os.chdir('./model')
extract_onnx_model("yolov9s.onnx", "yolov9s-cut.onnx")

Pulsar2のインストール

こちらを参照して、Pulsar2をインストールします。

quick_start_exampleのダウンロード

モデルのコンパイルとシミュレーション実行に必要なオリジナルモデル、データ、画像、シミュレーションツールを、次のリンクからダウンロードできるファイルの中に、quick_start_exampleフォルダ内に用意しています。
サンプルファイルをダウンロードをクリックし、ダウンロードしたファイルを解凍してdockerの/dataパスにコピーします。

quick_start_example.zip

root@xxx:~/data# ls
config  dataset  model  output  pulsar2-run-helper

# model: オリジナルのONNXモデルを格納します（事前にonnxsimを使用して最適化済み）
# dataset: オフライン量子化キャリブレーション（PTQ Calibration）に必要なデータセットの圧縮ファイルを格納します（tar、tar.gz、gzなどの一般的な圧縮形式に対応）
# config: 実行に必要な設定ファイルconfig.jsonを格納します
# output: 結果出力を格納します
# pulsar2-run-helper: X86環境でのaxmodelのシミュレーション実行をサポートする

modelフォルダの下に、yolov9s-cut.onnxをコピーします。

YOLOv9モデルのAXモデルへの変換

Pulsar2がインストールされている、Dockerを起動します。

$ sudo docker run -it --net host --rm -v $PWD:/data pulsar2:temp-58aa62e4

Pulsar2のbuildコマンドで、onnxモデルをModule-LLM(ax630c)のNPUに対応するaxモデルへ変換することができます。

# pulsar2 build --input model/yolov9s-cut.onnx --output_dir output --config config/yolov9.json --target_hardware AX620E
# cp output/compiled.axmodel model/yolov9.axmodel

mobilenet_v2_build_config.jsonは、pulsar2でモデル変換を行うための設定ファイルを記載しているものです。今回の設定は以下のようになっています。

yolov9.json

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "images",
        "calibration_dataset": "./dataset/coco_4.tar",
        "calibration_size": 32,
        "calibration_mean": [0, 0, 0],
        "calibration_std": [255.0, 255.0, 255.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": true,
    "precision_analysis_method":"EndToEnd"
  },
  "input_processors": [
    {
      "tensor_name": "images",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "output_processors": [
    {
      "tensor_name": "/model.22/Concat_output_0",
      "dst_perm": [0, 2, 3, 1]
    },    {
      "tensor_name": "/model.22/Concat_1_output_0",
      "dst_perm": [0, 2, 3, 1]
    },    {
      "tensor_name": "/model.22/Concat_2_output_0",
      "dst_perm": [0, 2, 3, 1]
    }
  ],
  "compiler": {
    "check": 0
  }
}

モデルが生成できていることを確認します。

$ ls model/*.axmodel
yolov9.axmodel.axmodel

参考リンク

@nnn112358/M5_LLM_Module_Report
https://github.com/nnn112358/M5_LLM_Module_Report

pulsar2-docs
https://pulsar2-docs.readthedocs.io/en/latest/index.html
https://axera-pi-zero-docs-cn.readthedocs.io/zh-cn/latest/doc_guide_algorithm.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up