M5Stack Module LLM Advent Calendar 2024

Module-LLMのNPU用モデルへ変換する(SwinTransformer編)

Last updated at 2024-12-22Posted at 2024-12-22

目的

　Module-LLMでSwin_Transformerのモデルを実行する手順を説明します。
　Module-LLMのNPUでSwin_Transformerのモデルを高速に実行するには、Pulsar2というツールを使用してINT8形式に量子化してモデルサイズを縮小する必要があります。

Swin_TransformerモデルのONNX出力

1.環境準備

公式リポジトリからモデルを取得します。PyTorchで学習されたSwin Transformerのpthフォーマットのモデルを、pythonスクリプトでonnxフォーマットに変換します。

$ pip install onnx torch requests onnxsim pillow transformers
$ python swin_transformer_export.py

swin_transformer_export.py

import onnx
import torch
import requests
from onnxsim import simplify
from PIL import Image
from transformers import AutoFeatureExtractor, SwinForImageClassification

def download_swin_model(model_name):
    prefix = "microsoft"
    model_id = f"{prefix}/{model_name}"
    
    # Download and prepare test image
    url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
    image = Image.open(requests.get(url, stream=True).raw)
    
    # Load model and feature extractor
    feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
    model = SwinForImageClassification.from_pretrained(model_id)
    model.eval()
    
    # Prepare inputs
    inputs = feature_extractor(images=image, return_tensors="pt")
    
    # Test inference
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        predicted_class_idx = logits.argmax(-1).item()
        print("Predicted class:", model.config.id2label[predicted_class_idx])
    
    # Export to ONNX with fixed input shape
    model_path = f"{model_name}.onnx"
    
    torch.onnx.export(
        model,
        tuple(inputs.values()),
        f=model_path,
        do_constant_folding=True,
        input_names=["input"],
        output_names=["output"],
        opset_version=13,
        export_params=True
    )
    
    try:
        model = onnx.load(model_path)
        model_simp, check = simplify(
            model,
#            skip_shape_inference=True  # 形状推論のみスキップ
        )
        
        if check:
            simp_path = f"{model_name}_sim.onnx"
            onnx.save(model_simp, simp_path)
            print(f"Successfully saved simplified model to {simp_path}")
        else:
            print("Warning: Simplified ONNX model could not be validated")
            print("Saving original ONNX model without simplification")
            
    except Exception as e:
        print(f"Error during simplification: {str(e)}")
        print("Saving original ONNX model without simplification")

def main():
    download_swin_model(model_name="swin-tiny-patch4-window7-224")

if __name__ == "__main__":
    main()

onnxsimでonnxの最適化を行う

$ onnxsim swin-tiny-patch4-window7-224.onnx swin-tiny-patch4-window7-224_sim.onnx

Pulsar2のインストール

こちらを参照して、Pulsar2をインストールします。

quick_start_exampleのダウンロード

モデルのコンパイルとシミュレーション実行に必要なオリジナルモデル、データ、画像、シミュレーションツールを、次のリンクからダウンロードできるファイルの中に、quick_start_exampleフォルダ内に用意しています。
サンプルファイルをダウンロードをクリックし、ダウンロードしたファイルを解凍してdockerの/dataパスにコピーします。

quick_start_example.zip

root@xxx:~/data# ls
config  dataset  model  output  pulsar2-run-helper

# model: オリジナルのONNXモデルを格納します（事前にonnxsimを使用して最適化済み）
# dataset: オフライン量子化キャリブレーション（PTQ Calibration）に必要なデータセットの圧縮ファイルを格納します（tar、tar.gz、gzなどの一般的な圧縮形式に対応）
# config: 実行に必要な設定ファイルconfig.jsonを格納します
# output: 結果出力を格納します
# pulsar2-run-helper: X86環境でのaxmodelのシミュレーション実行をサポートする

modelフォルダの下に、swin-tiny-patch4-window7-224_sim.onnxをコピーします。

SwinTransformerモデルのAXモデルへの変換

Pulsar2がインストールされている、Dockerを起動します。

$ sudo docker run -it --net host --rm -v $PWD:/data pulsar2:temp-58aa62e4

Pulsar2のbuildコマンドで、onnxモデルをModule-LLM(ax630c)のNPUに対応するaxモデルへ変換することができます。グラフ最適化、オフライン量子化、コンパイル、性能比較機能が一括で完了します。

$ pulsar2 build --input model/swin-tiny-patch4-window7-224_sim.onnx --output_dir output/ --config config/mobilenet_v2_build_config.json

pulsar2でモデル変換を行うための設定ファイルはmobilenetと同じファイルを使用しました。

AXモデルが生成できていることを確認します。

$ ls output
compiled.axmodel

Module-LLMで実行

SwinTransformerのモデルは、exampleで入っているmobilenetv2のAXモデルと同じ形式であるため、mobilenetv2のexampleを参考にModule-LLMで動作させてみます。

root@m5stack-LLM:# python3 classification.py
[INFO] Chip type: ChipType.MC20E
[INFO] Engine version: 2.6.3sp
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Model type: 0 (half core)
[INFO] Compiler version: 3.3 3cdead5e
Top 5 Predictions:
Class Index: 452, Score: 7.351101398468018
Class Index: 601, Score: 4.944103717803955
Class Index: 689, Score: 4.748941898345947
Class Index: 735, Score: 4.0333476066589355
Class Index: 501, Score: 3.9682936668395996

参考リンク

nnn112358/M5_LLM_Module_Report
https://github.com/nnn112358/M5_LLM_Module_Report

基于 AX650N 部署 Swin Transformer
https://zhuanlan.zhihu.com/p/621582671

pulsar2-docs
https://pulsar2-docs.readthedocs.io/en/latest/index.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up