目的
Module-LLMでYOLO Worldのモデルを実行する手順を説明します。
Module-LLMのNPUでモデルを高速に実行するには、Pulsar2というツールを使用してINT8形式に量子化してモデルサイズを縮小する必要があります。
YOLO WorldのONNX出力
1. AXERA-TECHのYOLO-Worldのプロジェクトをクローンします。
git clone https://github.com/AXERA-TECH/ONNX-YOLO-World-Open-Vocabulary-Object-Detection.git
2. YOLO WorldのONNXエクスポート
./export_ax.shを実行します。
このスクリプトは、AXERA-TECHで加工したyoloworld/ModelExporter.pyをyoloworld/ModelExporter_ax.pyで更新することと、YOLO Worldのonnxモデルをエクスポートし、models/yolov8s-worldv2-ax.onnxに保存することを行います。
$ ./export_ax.sh
if [ ! -f "yolov8s-worldv2.pt" ]; then
wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt
fi
if [ ! -d "third_party" ]; then
mkdir third_party
fi
cd third_party
if [ ! -d "ultralytics" ]; then
git clone https://github.com/ZHEQIUSHUI/ultralytics.git
cd ultralytics
git checkout no_einsum
cd ..
fi
cd ../
if [ ! -d "ultralytics" ]; then
ln -s third_party/ultralytics/ultralytics .
fi
cp yoloworld/ModelExporter_ax.py yoloworld/ModelExporter.py
python export_ultralytics_model.py --img_height 640 --img_width 640 --num_classes 4 --model_name yolov8s-worldv2.pt
onnxsim models/yolov8s-worldv2.onnx models/yolov8s-worldv2-ax.onnx
3. YOLO WorldのオリジナルのONNXエクスポート
このスクリプトは、yoloworld/ModelExporter.pyを元に戻し、オリジナルのYOLO Worldのonnxモデルをエクスポートし、models/yolov8s-worldv2-original.onnxに保存することを行います。
このONNXモデルは、python onnxruntimeを直接呼び出して推論を実行するのに適しています。
$ ./export_original.sh
if [ ! -f "yolov8s-worldv2.pt" ]; then
wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt
fi
if [ ! -d "third_party" ]; then
mkdir third_party
fi
cd third_party
if [ ! -d "ultralytics" ]; then
git clone https://github.com/ZHEQIUSHUI/ultralytics.git
cd ultralytics
git checkout no_einsum
cd ..
fi
cd ../
if [ ! -d "ultralytics" ]; then
ln -s third_party/ultralytics/ultralytics .
fi
cp yoloworld/ModelExporter_original.py yoloworld/ModelExporter.py
python export_ultralytics_model.py --img_height 640 --img_width 640 --num_classes 4 --model_name yolov8s-worldv2.pt
onnxsim models/yolov8s-worldv2.onnx models/yolov8s-worldv2-original.onnx
4. キャリブレーションデータの生成
Pulsar2がyoloworld.vitb.txt.onnxのモデルを変換するために,キャリブレーションデータを生成します。
yolo_world_calib_token_data.tarが生成されます。
python export_clip_text_model.py
from argparse import ArgumentParser
import numpy as np
import torch,os
from yoloworld import TextEmbedder
# Initialize text embedder
text_embedder = TextEmbedder(device="cpu")
text_token = text_embedder.tokenize(["person", "bicycle", "car", "motorcycle"])
torch.onnx.export(text_embedder, text_token, "models/yoloworld.vitb.txt.onnx")
os.system("onnxsim models/yoloworld.vitb.txt.onnx models/yoloworld.vitb.txt.onnx")
coco_names = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"]
os.makedirs("tokens", exist_ok=True)
coco_names_group4 = [coco_names[i:i+4] for i in range(0, len(coco_names), 4)]
for class_name_ in coco_names_group4:
print(f"Saving {class_name_}")
class_name = class_name_[0]
# Get text embeddings
text_token = text_embedder.tokenize(class_name_).cpu().numpy()
np.save(f"tokens/{class_name.replace(' ', '_')}.npy", text_token)
os.system("tar -cvf yolo_world_calib_token_data.tar tokens/*.npy")
5. テキスト特徴データをエクスポート
YOLO World検出モデルに入力されるテキスト特徴データをエクスポートします。
YOLO World検出モデルをPulsar2とコンパイルする際に依存する、テキスト定量校正データセットをyolo_world_calib_txt_data.tarをエクスポートします。
$ python save_coco_npy.py
$ tar -cvf yolo_world_calib_txt_data.tar tmp/*.npypython yolov5s_cut-onnx.py
from argparse import ArgumentParser
import numpy as np
import os
from yoloworld import TextEmbedder
coco_names = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"]
# Initialize text embedder
text_embedder = TextEmbedder(device="cpu")
os.makedirs("tmp", exist_ok=True)
coco_names_group4 = [coco_names[i:i+4] for i in range(0, len(coco_names), 4)]
print(coco_names_group4)
for class_name_ in coco_names_group4:
print(f"Saving {class_name_}")
class_name = class_name_[0]
# Get text embeddings
class_embeddings = text_embedder.embed_text(class_name_)
# Convert to numpy array
class_embeddings = class_embeddings.cpu().numpy().astype(np.float32)
np.savez(f"tmp/{class_name.replace(' ', '_')}.npz", class_embeddings=class_embeddings, class_list=np.array(class_name_))
np.save(f"tmp/{class_name.replace(' ', '_')}.npy", class_embeddings)
with open(f"tmp/{class_name.replace(' ', '_')}.bin", "wb") as f:
f.write(class_embeddings.tobytes())
axモデルへの変換
Pulsar2がインストールされている、Dockerを起動します。
$ sudo docker run -it --net host --rm -v $PWD:/data pulsar2:3.3
Pulsar2のbuildコマンドで、onnxモデルをModule-LLM(ax630c)のNPUに対応するaxモデルに変換します。
# $ pulsar2 build --input model/yolov8s-worldv2-ax.onnx --config config/yoloworld.json --output_dir output --output_name yoloworldv2_4cls_50_npu1.axmodel --npu_mode NPU1 --target_hardware AX620E
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "images",
"calibration_dataset": "./dataset/coco_1000.tar",
"calibration_size": 20,
"calibration_mean": [0, 0, 0],
"calibration_std": [255.0, 255.0, 255.0]
},
{
"tensor_name": "txt_feats",
# "calibration_dataset": "./dataset/yolo_world_calib_data_4cls.tar",
"calibration_dataset": "./dataset/yolo_world_calib_txt_data.tar",
"calibration_size": 20,
"calibration_format": "Numpy"
}
],
"calibration_method": "MinMax",
"precision_analysis": true,
"precision_analysis_method":"EndToEnd",
"transformer_opt_level": 1,
"enable_smooth_quant": true
},
"input_processors": [
{
"tensor_name": "images",
"tensor_format": "RGB",
"src_format": "RGB",
"src_dtype": "U8",
"src_layout": "NHWC"
},
{
"tensor_name": "txt_feats",
"src_dtype": "FP32"
}
],
"compiler": {
"check": 0
}
}
Module-LLMで実行
Module-LLMにNPU用モデルとCVサンプルの実行ファイル,tmpフォルダの下に生成されているbinファイルをコピーして実行します。
YOLO Worldの入力テキスト特徴データdog.binから、class_ids = ['dog' 'horse' 'sheep' 'cow']であることがわかりました。
root@m5stack-LLM#./ax_yolo_world_open_vocabulary -m yoloworldv2_4cls_50_npu1.axmodel -i ssd_horse.jpg -t dog.bin
--------------------------------------
model file : yoloworldv2_4cls_50_npu1.axmodel
image file : ssd_horse.jpg
img_h, img_w : 640 640
--------------------------------------
Engine creating handle is done.
Engine creating context is done.
Engine get io info is done.
input size: 2
name: images [UINT8] [RGB]
1 x 640 x 640 x 3
name: txt_feats [FLOAT32] [FEATUREMAP]
1 x 4 x 512
output size: 3
name: stride8 [FLOAT32]
1 x 80 x 80 x 68
name: stride16 [FLOAT32]
1 x 40 x 40 x 68
name: stride32 [FLOAT32]
1 x 20 x 20 x 68
Engine alloc io is done.
Engine push input is done.
--------------------------------------
post process cost time:2.47 ms
--------------------------------------
Repeat 1 times, avg time 29.07 ms, max_time 29.07 ms, min_time 29.07 ms
--------------------------------------
detection num: 2
1: 91%, [ 215, 70, 420, 373], class2
0: 67%, [ 144, 203, 197, 345], class1
--------------------------------------
参考
再谈 YOLO World 部署
https://zhuanlan.zhihu.com/p/721856217