1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

M5Stack Module LLM Advent Calendar 2024

Day 24

Module-LLMのNPU用モデルへ変換する(YOLO World編)

Last updated at Posted at 2024-12-23

目的

 Module-LLMでYOLO Worldのモデルを実行する手順を説明します。
 Module-LLMのNPUでモデルを高速に実行するには、Pulsar2というツールを使用してINT8形式に量子化してモデルサイズを縮小する必要があります。

YOLO WorldのONNX出力

1. AXERA-TECHのYOLO-Worldのプロジェクトをクローンします。

git clone https://github.com/AXERA-TECH/ONNX-YOLO-World-Open-Vocabulary-Object-Detection.git

2. YOLO WorldのONNXエクスポート

./export_ax.shを実行します。
このスクリプトは、AXERA-TECHで加工したyoloworld/ModelExporter.pyをyoloworld/ModelExporter_ax.pyで更新することと、YOLO Worldのonnxモデルをエクスポートし、models/yolov8s-worldv2-ax.onnxに保存することを行います。

$ ./export_ax.sh
export_ax.sh
if [ ! -f "yolov8s-worldv2.pt" ]; then
    wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt
fi

if [ ! -d "third_party" ]; then
    mkdir third_party
fi
cd third_party
if [ ! -d "ultralytics" ]; then
    git clone https://github.com/ZHEQIUSHUI/ultralytics.git
    cd ultralytics
    git checkout no_einsum
    cd ..
fi
cd ../
if [ ! -d "ultralytics" ]; then
    ln -s third_party/ultralytics/ultralytics .
fi
cp yoloworld/ModelExporter_ax.py yoloworld/ModelExporter.py
python export_ultralytics_model.py --img_height 640 --img_width 640 --num_classes 4 --model_name yolov8s-worldv2.pt 
onnxsim models/yolov8s-worldv2.onnx models/yolov8s-worldv2-ax.onnx

3. YOLO WorldのオリジナルのONNXエクスポート

このスクリプトは、yoloworld/ModelExporter.pyを元に戻し、オリジナルのYOLO Worldのonnxモデルをエクスポートし、models/yolov8s-worldv2-original.onnxに保存することを行います。
このONNXモデルは、python onnxruntimeを直接呼び出して推論を実行するのに適しています。

$ ./export_original.sh 
./export_original.sh
if [ ! -f "yolov8s-worldv2.pt" ]; then
    wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt
fi

if [ ! -d "third_party" ]; then
    mkdir third_party
fi
cd third_party
if [ ! -d "ultralytics" ]; then
    git clone https://github.com/ZHEQIUSHUI/ultralytics.git
    cd ultralytics
    git checkout no_einsum
    cd ..
fi
cd ../

if [ ! -d "ultralytics" ]; then
    ln -s third_party/ultralytics/ultralytics .
fi

cp yoloworld/ModelExporter_original.py yoloworld/ModelExporter.py

python export_ultralytics_model.py --img_height 640 --img_width 640 --num_classes 4 --model_name yolov8s-worldv2.pt 
onnxsim models/yolov8s-worldv2.onnx models/yolov8s-worldv2-original.onnx

4. キャリブレーションデータの生成

Pulsar2がyoloworld.vitb.txt.onnxのモデルを変換するために,キャリブレーションデータを生成します。
yolo_world_calib_token_data.tarが生成されます。

python export_clip_text_model.py
export_clip_text_model.py
from argparse import ArgumentParser
import numpy as np
import torch,os
from yoloworld import TextEmbedder
# Initialize text embedder
text_embedder = TextEmbedder(device="cpu")
text_token = text_embedder.tokenize(["person", "bicycle", "car", "motorcycle"])
torch.onnx.export(text_embedder, text_token, "models/yoloworld.vitb.txt.onnx")
os.system("onnxsim models/yoloworld.vitb.txt.onnx models/yoloworld.vitb.txt.onnx")
coco_names = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
    "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
    "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
    "hair drier", "toothbrush"]
os.makedirs("tokens", exist_ok=True)
coco_names_group4 = [coco_names[i:i+4] for i in range(0, len(coco_names), 4)]
for class_name_ in coco_names_group4:
    print(f"Saving {class_name_}")
    class_name = class_name_[0]
    # Get text embeddings
    text_token = text_embedder.tokenize(class_name_).cpu().numpy()
    np.save(f"tokens/{class_name.replace(' ', '_')}.npy", text_token)
os.system("tar -cvf yolo_world_calib_token_data.tar tokens/*.npy")

5. テキスト特徴データをエクスポート

YOLO World検出モデルに入力されるテキスト特徴データをエクスポートします。
YOLO World検出モデルをPulsar2とコンパイルする際に依存する、テキスト定量校正データセットをyolo_world_calib_txt_data.tarをエクスポートします。

$ python save_coco_npy.py 
$ tar -cvf yolo_world_calib_txt_data.tar tmp/*.npypython yolov5s_cut-onnx.py
save_coco_npy.py
from argparse import ArgumentParser
import numpy as np
import os
from yoloworld import TextEmbedder
coco_names = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
    "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
    "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
    "hair drier", "toothbrush"]
# Initialize text embedder
text_embedder = TextEmbedder(device="cpu")
os.makedirs("tmp", exist_ok=True)
coco_names_group4 = [coco_names[i:i+4] for i in range(0, len(coco_names), 4)]
print(coco_names_group4)
for class_name_ in coco_names_group4:
    print(f"Saving {class_name_}")
    class_name = class_name_[0]
    # Get text embeddings
    class_embeddings = text_embedder.embed_text(class_name_)
    # Convert to numpy array
    class_embeddings = class_embeddings.cpu().numpy().astype(np.float32)
    np.savez(f"tmp/{class_name.replace(' ', '_')}.npz", class_embeddings=class_embeddings, class_list=np.array(class_name_))
    np.save(f"tmp/{class_name.replace(' ', '_')}.npy", class_embeddings)
    with open(f"tmp/{class_name.replace(' ', '_')}.bin", "wb") as f:
        f.write(class_embeddings.tobytes())

axモデルへの変換

Pulsar2がインストールされている、Dockerを起動します。

$ sudo docker run -it --net host --rm -v $PWD:/data pulsar2:3.3

Pulsar2のbuildコマンドで、onnxモデルをModule-LLM(ax630c)のNPUに対応するaxモデルに変換します。

#  $ pulsar2 build --input model/yolov8s-worldv2-ax.onnx --config config/yoloworld.json --output_dir output --output_name yoloworldv2_4cls_50_npu1.axmodel --npu_mode NPU1 --target_hardware AX620E
yoloworld.json
{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "images",
        "calibration_dataset": "./dataset/coco_1000.tar",
        "calibration_size": 20,
        "calibration_mean": [0, 0, 0],
        "calibration_std": [255.0, 255.0, 255.0]
      },
      {
        "tensor_name": "txt_feats",
#        "calibration_dataset": "./dataset/yolo_world_calib_data_4cls.tar",
        "calibration_dataset": "./dataset/yolo_world_calib_txt_data.tar",
        "calibration_size": 20,
        "calibration_format": "Numpy"
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": true,
    "precision_analysis_method":"EndToEnd",
    "transformer_opt_level": 1,
    "enable_smooth_quant": true
  },
  "input_processors": [
    {
      "tensor_name": "images",
      "tensor_format": "RGB",
      "src_format": "RGB",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    },
    {
      "tensor_name": "txt_feats",
      "src_dtype": "FP32"
    }
  ],
  "compiler": {
    "check": 0
  }
}

Module-LLMで実行

Module-LLMにNPU用モデルとCVサンプルの実行ファイル,tmpフォルダの下に生成されているbinファイルをコピーして実行します。
YOLO Worldの入力テキスト特徴データdog.binから、class_ids = ['dog' 'horse' 'sheep' 'cow']であることがわかりました。

root@m5stack-LLM#./ax_yolo_world_open_vocabulary -m yoloworldv2_4cls_50_npu1.axmodel -i ssd_horse.jpg -t dog.bin
--------------------------------------
model file : yoloworldv2_4cls_50_npu1.axmodel
image file : ssd_horse.jpg
img_h, img_w : 640 640
--------------------------------------
Engine creating handle is done.
Engine creating context is done.
Engine get io info is done.

input size: 2
    name:   images [UINT8] [RGB]
        1 x 640 x 640 x 3

    name: txt_feats [FLOAT32] [FEATUREMAP]
        1 x 4 x 512


output size: 3
    name:  stride8 [FLOAT32]
        1 x 80 x 80 x 68

    name: stride16 [FLOAT32]
        1 x 40 x 40 x 68

    name: stride32 [FLOAT32]
        1 x 20 x 20 x 68

Engine alloc io is done.
Engine push input is done.
--------------------------------------
post process cost time:2.47 ms
--------------------------------------
Repeat 1 times, avg time 29.07 ms, max_time 29.07 ms, min_time 29.07 ms
--------------------------------------
detection num: 2
 1:  91%, [ 215,   70,  420,  373], class2
 0:  67%, [ 144,  203,  197,  345], class1
--------------------------------------

yolo_world_out.jpg

参考

再谈 YOLO World 部署
https://zhuanlan.zhihu.com/p/721856217

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?