DETR の Backbone をエッジデバイスで、Transformer をPCで動かす

Last updated at 2025-09-08Posted at 2025-09-08

DETR の Backbone をエッジデバイスで、Transformer をPCで動かす

本記事の概要

ソニーセミコンダクタソリューションズ　AITRIOS カスタマーサポートの松岡／織田です。

前記事では、DETR のネットワークを Backbone と Transformer に分離し、PC 上でこれらを通しで動かして、物体検出の動作確認まで行いました。

Backbone を MobileNetV2 に置き換えて学習した上で、ネットワークを Backbone と Transformer の２つに分離します。
さらに　Backbone に Classifier を追加してクラス分類モデルを転移学習します。
その上で、Classifier 付きの Backbone と、分離した Transformer の２つのモデルを使い、Python で物体検出の動作を確認します。

本記事では、実際にエッジデバイスを使い、AITRIOS の Console Developer Edition で動作を確認します。

クラス分類付き backbone をPC上で量子化し、量子化したモデルをConsole からエッジデバイスにデプロイします。
エッジデバイスの撮影動作を設定します。
Console が受信した Output tensor を Console から定期的に取得し、いずれかのクラスの確率が高い場合に物体検出を行うコードを、Python で実装します。
エッジデバイスで撮影を行い、システムを通して動作を確認します。

この記事からご覧になっている方は、前記事もお読みください。
本記事の内容を試すには、少なくとも前記事の「Backbone への Classifier Layer 追加とクラス分類学習」まで実行している必要があります。

また、AITRIOS の Console Developer Edition のご契約が必要です。
もし AITRIOS をご存じなければ、こちらのサイトもご覧ください。
なお、AITIROS の Console Developer Edition は、法人様向けのサービスです。

AITRIOS の Site: https://www.aitrios.sony-semicon.com/
AITRIOS の Developer Site: https://developer.aitrios.sony-semicon.com/
「AITRIOS のデバイスで Object Detection してメタデータを取得してみた」

本記事に関する誤りや不備の指摘、ご質問などがありましたら、記事にコメントしてください。
コメントへのご返答にはお時間を頂いたり、ご返答できない可能性もありますがご了承ください。
本記事はあくまで応用事例を紹介するものであり、実際に動作させたときの性能や品質を保証するものではありません。
第三者特許の調査はしておりません。
AITRIOSの不具合につきましては、AITRIOSのサポートページからご確認ください。

本記事を進めるにあたって

Console Developer Edition について

本記事では、Console V2 を使用してエッジデバイスの操作を行います。

Console V1 でエッジデバイスを使用されているお客様は「Console V2用エッジファームウェアへの移行」を参照して、ご購入直後のエッジデバイスを使用されるお客様は「デバイスセットアップガイド」参照して、エッジファームウェアを Console V2 用に更新してください。
エッジデバイスのエッジファームウェアについて

エッジデバイスのエッジファームウェアは、最新版に更新して下さい。
使用するフォルダについて

本記事の実装では、クローンした DETR フォルダをコピーしたfeature 入力の実装用フォルダを、前記事に引き続いて使用します。
ここからは、このフォルダを Separated DETR 動作検証フォルダと呼びます。
学習用にクローンしたフォルダは使用しません。
ライブラリの追加について

下記のライブラリを追加インストールしてください。
```
pip install requests
pip install Pillow
```

クラス分類付き Backbone の Output tensor 形状変更

エッジデバイスが出力するOutput Tensor を一次元配列にするため、 Backbone の最終レイヤーに Flatten を追加します。

前の記事で作成した mobilenet.py に2行追加しただけですが、下記の mobilenet_for_device.py を、Separated DETR 動作検証フォルダに置いてください。

mobilenet_for_device.py

import torch
import torch.nn.functional as F
import torchvision
from torch import nn
from torchvision.models._utils import IntermediateLayerGetter

import numpy as np
from torchvision.models.mobilenet import mobilenet_v2

class mobilenet_with_feature_output(nn.Module):

    def __init__(self, num_of_classes : int):
        super().__init__()

        self.backbone = mobilenet_v2(weights='IMAGENET1K_V1')

        self.backbone.classifier[1] = nn.Linear(in_features=1280, out_features=num_of_classes)
        self.backbone.classifier = nn.Sequential(
                        self.backbone.classifier[0],
                        self.backbone.classifier[1],
                        nn.Sigmoid()
                    )

        layer = dict([*self.backbone.named_modules()])['features.18']
        layer.register_forward_hook(self.hook_fn)

        for name, parameter in self.backbone.named_parameters():
            if name.startswith('classifier'):
                parameter.requires_grad_(True)
            else:
                parameter.requires_grad_(False)

        num_channels_moboinet=1280
        self.num_channels = 256

        self.resize = torch.nn.Conv2d(in_channels=num_channels_moboinet,out_channels=self.num_channels,kernel_size=(1,1),bias=False)

        self.flatten  = torch.nn.Flatten()

        step = int(num_channels_moboinet/self.num_channels)
        weight = np.array( [[0 if i<j or (j+step-1)<i else 1 for i in range(num_channels_moboinet) ] for j in range(0,num_channels_moboinet, step) ] , dtype = 'float32' )
        weight = weight.reshape(self.num_channels,num_channels_moboinet,1,1)
        self.resize.weight = nn.Parameter(torch.from_numpy(weight))
        self.resize.requires_grad = False

    def hook_fn(self, module, input, output):
        global intermediate_output
        intermediate_output = output

    def forward(self, tensors):

        y = self.backbone(tensors)
        feature  = self.resize(intermediate_output)
        feature  = self.flatten(feature)

        return y, feature

クラス分類付き Backbone の量子化と、エッジデバイスへのデプロイ

基本的な手順は、「PyTorchモデルデプロイガイド」に従います。

ただし、量子化には Post-training Quantization ではなく Gradient-Based Post-Training Quantization を用います。

また、エッジデバイスから Output tensor を取得するため、Paththrough エッジアプリケーションをデプロイします。

量子化について

PyTorchで作成したAIモデルをConsoleからエッジデバイスにデプロイするためには、 Model Compression Toolkit (MCT) を使用して、浮動小数点モデルを8ビット整数モデルに変換する必要があります。
MCTは、Pythonで動作するApache-2.0 ライセンスのオープンソースソフトウェアであり、Post-Training Quantization (PTQ) に基づく量子化を提供します。

PTQは、学習済みモデルの各Tensorが取りえる値域 (クリッピング範囲) を実際のデータセット入力から求め、このクリッピング範囲を8ビット整数表現することで、整数モデルに変換します。
この量子化計算を、キャリブレーションとよびます。

適切な量子化のためには、キャリブレーション用データセットが、実際の入力に対してある程度の網羅性を持つ必要があります。
そこで、モデル学習に使ったデータセットを用いてキャリブレーションを行います。

詳細については、 MCTのGitHubリポジトリをご覧ください。
なお、モデル量子化後は ONNX で保存します。

量子化の実行

ここでは、前記事で作成したクラス分類付き Backbone を Model Compression Toolkit(MCT) で量子化します。

この節の最後にある quantization.py を、Separated DETR 動作検証フォルダの直下に置いて、実行します。

python quantization.py

実行には、Python ライブラリの Model compression toolkit (MCT) が必要です。
前記事の Dockerfile には MCT のインストールが含まれていますが、ご自分の環境をお使いの場合 model-compression-toolkit==2.0.0 をインストールしてください。
MCT version 2.0.0 の動作条件は、 Version 2.0.0 release の Readme.md を参照してください。

quantization.py

コードは、「PyTorchモデルデプロイガイド」の「モデルの量子化」
のサンプルコードに基づきます。
こちらのコードの説明もご覧ください。

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader

import model_compression_toolkit as mct
from model_compression_toolkit.core import QuantizationErrorMethod

from for_separation.mobilenet_for_device import mobilenet_with_feature_output
from for_separation.my_coco import CocoClassificationDataset

from torchvision.models import mobilenet_v2
import argparse


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--model_path', type=str, default='./backbone_with_classifier_weight.pth', help='The path to the keras model')
    parser.add_argument('--annotation_file', type=str, default='/data/image/coco/annotations/instances_train2017.json', help='The path to the annotation file')
    parser.add_argument('--image_folder', type=str, default='/data/image/coco/images/train2017', help='The path to the image folder')
    parser.add_argument('--quantized_model_path', type=str, default='separated_moblienet_quantized.onnx', help='The path to the quantized model')
    parser.add_argument('--num_of_classes', default=91, type=int,  help='the number of classes')
    args = parser.parse_args()

    batch_size = 32

    #<1>  Load a floating-point PyTorch model.
    model = mobilenet_with_feature_output(num_of_classes = args.num_of_classes)
    model.load_state_dict(torch.load(args.model_path, map_location=torch.device('cpu')))

    #<2> Load a calibration dataset for quantization.
    #    The calibration dataset is normalized to match the normalization used during training.
    train_dataset = CocoClassificationDataset(
        annotation_file = args.annotation_file,
        image_folder = args.image_folder,
        num_of_classes = args.num_of_classes,
        transform = transforms.Compose([
            transforms.Resize(size=(224,224)),
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.shape[0] == 1 else x)
        ])
    )

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=False)
    image_data_loader = iter(train_loader)

    #<3> Create a representative dataset generator
    n_iter=len(train_loader)
    def representative_data_gen() -> list:
        ds_iter = iter(train_loader)
        for _ in range(n_iter):
            yield [next(ds_iter)[0]]

    #<4> Set a configuration.
    q_config = mct.core.QuantizationConfig(activation_error_method=QuantizationErrorMethod.MSE,
                                       weights_error_method=QuantizationErrorMethod.MSE,
                                       weights_bias_correction=True,
                                       shift_negative_activation_correction=True,
                                       z_threshold=16)
    tpc = mct.get_target_platform_capabilities("pytorch", 'imx500', target_platform_version='v1')
    ptq_config = mct.core.CoreConfig(quantization_config=q_config)

    #<5> Quantize the floating-point PyTorch model to the 8-bit integer PyTorch model.
    quantized_model, quantization_info = mct.gptq.pytorch_gradient_post_training_quantization(model=model,
        representative_data_gen=representative_data_gen,
        core_config=ptq_config,
        target_platform_capabilities=tpc)

    #<6> Save the integer model as an ONNX model.
    mct.exporter.pytorch_export_model(model=quantized_model,
                                      save_model_path=args.quantized_model_path,
                                      repr_dataset=representative_data_gen)

エッジデバイスへのデプロイ

Console V2 から、AIモデルとエッジアプリケーションをエッジデバイスにデプロイします。
基本的な手順は、「PyTorchモデルデプロイガイド」に従います。

量子化したモデルを Console V2 にインポートして、Convert する。

手順は、「Console V2ユーザーマニュアル」の「4.1.1 インポート」、ならびに「PyTorchモデルデプロイガイド」の「AIモデルのインポートとコンバート」をごらんください。
Edge Application SDK for AITRIOS の GitHub から、Paththrough エッジアプリケーションをダウンロードする。

2025年7月現在は、
https://github.com/SonySemiconductorSolutions/aitrios-sdk-edge-app/releases/tag/1.1.6 の sample_edge_app_passthrough_wasm_v2_1.1.6.zip が最新版です。
エッジアプリケーションを Console V2 にインポートする。

手順は、「Console V2ユーザーマニュアル」の「4.1.1 インポート」、ならびに「PyTorchモデルデプロイガイド」の「エッジアプリケーションのインポート」をごらんください。
AIモデルとエッジアプリケーションを、エッジデバイスにデプロイする。

手順は、「Console V2ユーザーマニュアル」の「SW Provisioning」の、ならびに「PyTorchモデルデプロイガイド」の「デプロイ操作」をごらんください。

動作検証コードの実装

コードの概要は、下記のとおりです。

指定された回数だけOutput tensor と画像を取得します。
クラス分類の確率がいずれかでも高い場合には、Transformer での物体検出を実行し、検出結果を描画した画像を保存します。

コードでは、エッジデバイス制御や、Output tensor と画像の取得に、 Console REST API を使います。
Console REST API については、こちらもご覧ください。

動作検証コードの処理フロー

Transformer のモデルを構築し、重みを読み込む。
Console REST API の呼び出しに必要な、アクセストークンを取得する。
エッジデバイスの撮影開始を、Console REST API で命令する。
Local PC の時間に基づいて、エッジデバイスが送信した最新の推論結果(Output Tensor) を、Console から REST API で取得する。
さらに推論結果のタイムスタンプに基づいて、Console から画像を取得する。
Output tensor を Base64デコードしたのち、さらに unpack する。そののち、features と probabilities の元の配列を復元する。
クラス分類の probabilities を判定し、いずれかでも高い場合にはTransformer で物体検出を行う。
検出結果が１つでもあれば、バウンダリボックスを描画した結果画像を保存する。
指定回数に達した場合、推論ループから抜ける。
エッジデバイスの撮影停止を、Console REST API で命令する。

コード

validate_with_edge_device.py

Separated DETR 動作検証フォルダの直下に、新規にファイルを置きます。

import argparse
import json
import sys
import time
from pathlib import Path
import numpy as np
import torch
import base64
import struct
from PIL import Image
import cv2
import yaml
import requests
from models import build_model
from main import get_args_parser
from detect import detect, draw_boxes
import datetime

def load_settings_file(settings_file_path):
    with open(settings_file_path, "r", encoding="utf-8") as file:
        yaml_data = yaml.safe_load(file)
    portal_authorization_endpoint = yaml_data['console_access_settings']['portal_authorization_endpoint']
    client_secret = yaml_data['console_access_settings']['client_secret']
    client_id = yaml_data['console_access_settings']['client_id']
    console_endpoint = yaml_data['console_access_settings']['console_endpoint']
    device_id = yaml_data['console_access_settings']['device_id']
    return portal_authorization_endpoint,client_secret,client_id,console_endpoint,device_id

def get_access_token(portal_authorization_endpoint,client_secret,client_id):

    authorization = base64.b64encode((client_id + ':' + client_secret).encode()).decode()

    headers  = {'accept': 'application/json',
                'authorization': 'Basic {}'.format(str(authorization)),
                'cache-control': 'no-cache',
                'content-type': 'application/x-www-form-urlencoded'
                }
    data = 'grant_type=client_credentials&scope=system'
    response = requests.post(portal_authorization_endpoint, headers=headers, data=data)
    json_data = response.json()
    access_token = str(json_data['access_token'])
    return access_token

def get_device_modlue(console_endpoint,access_token,device_id):
    headers  = {'Authorization': 'Bearer {}'.format(access_token)}
    get_device_url = console_endpoint + '/devices/' + device_id
    response = requests.get(get_device_url, headers=headers)
    json_data = response.json()
    for module in json_data['modules']:
        if module['module_id'] != '$system' and module['property']['state'] and len(module['module_id']) > 0:
            return module['module_id']
    print("module_id null")
    sys.exit()

def update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param):
    headers  = {'Authorization': 'Bearer {}'.format(access_token)}
    update_configuration_url = console_endpoint + '/devices/' + device_id + '/modules/' + module_id + '/property'
    response = requests.patch(update_configuration_url, headers=headers, json=configuration_param)
    print(response.json())

def get_latest_data(console_endpoint,headers,device_id,start_time):

    target_time = datetime.datetime.now(tz=datetime.timezone.utc)
    target_time_str = target_time.strftime('%Y-%m-%dT%H:%M:%S.%f')
    get_inference_results_url = console_endpoint + '/inferenceresults?devices=' + device_id + '&limit=1&from_datetime=' + target_time_str

    while True:
        try :
            # Obtain the latest inference result (JSON) from the Console by filtering JSON with the target_time so that the time stamp in the JSON shall be later than target_time.
            response = requests.get(get_inference_results_url, headers=headers)
            inferenceresults = response.json()
            # Get the time stamp from the obtained JSON.
            timestamp = inferenceresults['inferences'][0]['inferences'][0]['T']
            timestamp = datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%f")
            timestamp = timestamp.strftime('%Y%m%d%H%M%S%f')[:-3]
            # Get the encoded output tensor from the obtained JSON.
            encoded_tensor = inferenceresults['inferences'][0]['inferences'][0]['O']
            print('\nGet inference : Success (timestamp : ' + str(timestamp) + ')')
            break

        except Exception as e:
            print('\rWaiting for inference : ' + str(e), end='')
            for i in range(10):
                time.sleep(0.05)

    while True:
        try :
            # Obtain the image folder names from the Console ane verify that the latest timestamp is later than the start time
            get_dir_name = console_endpoint + '/images/devices/directories?device_id=' + device_id
            response = requests.get(get_dir_name, headers=headers)
            json_data = response.json()
            sub_directory_name = json_data[0]['devices'][0]['Image'][-1]
            sub_directory_time = datetime.datetime.strptime(sub_directory_name, "%Y%m%d%H%M%S")
            sub_directory_time = sub_directory_time.replace(tzinfo=datetime.timezone.utc)
            if start_time > sub_directory_time:
                for i in range(10):
                    time.sleep(0.05)
                continue
            print('\nGet sub_directory_name : Success (sub_directory_name : ' + str(sub_directory_name) + ')')
            break

        except Exception as e:
            print('\rWaiting for get sub_directory_name : ' + str(e), end='')
            for i in range(10):
                time.sleep(0.05)

    while True:
        try :
            # Obtain the image to be sent to the Console.
            get_image = console_endpoint + '/images/devices/' + device_id + '/directories/' + sub_directory_name + '?limit=1&name_starts_with=' + timestamp
            response = requests.get(get_image, headers=headers)
            json_data = response.json()
            # Obtain the image from SAS URL.
            im_name = json_data['data'][0]['name']
            im_binary = requests.get(json_data['data'][0]['sas_url']).content
            jpg=np.frombuffer(im_binary,dtype=np.uint8)
            im = cv2.imdecode(jpg, cv2.IMREAD_COLOR)
            im = Image.fromarray(im)
            print('\nGet image : Success (image_name : ' + str(timestamp) + ')')
            break

        except Exception as e:
            print('\rWaiting for image : ' + str(e), end='')
            for i in range(10):
                time.sleep(0.05)

    return encoded_tensor,im,im_name

def get_output_tensors(output_tensor_size,encoded_tensor):

    encoded_tensor = base64.b64decode(encoded_tensor)
    decoded_tensor = struct.unpack(output_tensor_size,encoded_tensor)

    probabilities = np.array( decoded_tensor[:91] )
    features = np.array( decoded_tensor[91:] )
    features = features.reshape(1,256,7,7)
    return features,probabilities

def main(args):

    #<1> Create the transformer model and load weights for the model.
    device = torch.device(args.device)
    model, criterion, postprocessors = build_model(args)
    model.to(device)
    checkpoint = torch.load(args.transformer_path, map_location='cpu')
    model.load_state_dict(checkpoint)
    model.eval()

    configuration_json_open = open(args.configuration_file_path, 'r')
    configuration_json_load = json.load(configuration_json_open)
    configuration_param = {}
    configuration_param["configuration"] = (configuration_json_load)

    #<2> Get an access token required for calling console APIs.
    portal_authorization_endpoint,client_secret,client_id,console_endpoint,device_id = load_settings_file(args.settings_file_path)
    access_token = get_access_token(portal_authorization_endpoint,client_secret,client_id)
    headers  = {'Authorization': 'Bearer {}'.format(access_token)}

    #<3> Start inference on the Edge Device.
    start_time = datetime.datetime.now(tz=datetime.timezone.utc)
    module_id = get_device_modlue(console_endpoint,access_token,device_id)
    configuration_param["configuration"]["edge_app"]["common_settings"]["process_state"] = 1
    update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param)

    configuration_param["configuration"]["edge_app"]["common_settings"]["process_state"] = 2
    update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param)

    #get output_tensor and execute transformer
    execution_count = 0
    threshold = 0.8

    while True:

        #<4> Obtain encoded output tensor from the Edge Device and decode it.
        encoded_tensor,im,im_name = get_latest_data(console_endpoint,headers,device_id,start_time)
        #<5> Obtain classification probabilities and features from the output tensor.
        features,probabilities = get_output_tensors(args.output_tensor_size, encoded_tensor)

        if np.amax(probabilities[1:91]) > threshold :
            #<6> Execute transformer to detect objects.
            print(np.amax(probabilities[1:91]))

            features = torch.from_numpy(features.astype(np.float32)).clone()
            features.to(device)
            scores, boxes = detect(features , model, device, im.size)

            if scores.shape[0] > 0:
                #<7> Draw boundary box into the image (input tensor) and save it.
                im = np.array(im, dtype=np.uint8)
                mat_img = draw_boxes(im, scores, boxes, args.result_image_path)
                cv2.imwrite(args.result_image_path + ('/draw_') + im_name, mat_img)
                execution_count += 1

                print( execution_count )

        #<8> Exit the loop after the specified iterations.
        if execution_count >= args.num_executions :
            break

    #<9> Stop inference on the Edge Device.
    configuration_param["configuration"]["edge_app"]["common_settings"]["process_state"] = 1
    update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param)

if __name__ == '__main__':

    parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])

    parser.add_argument('--result_image_path', type=str, default='./images', help='The path to an result image')
    parser.add_argument('--transformer_path', type=str, default='transformer.pth', help='The path to the transformer model')
    parser.add_argument('--settings_file_path', type=str, default='./console_access_settings.yaml', help='The path to the setting file for rest api')
    parser.add_argument('--configuration_file_path', type=str, default='./edge_app_passthrough_configuration.json', help='The path to the configuration json')
    parser.add_argument('--num_executions', default='10', type=int, help='Number of executions')
    parser.add_argument('--output_tensor_size', default='12635f', type=str, help='Output Tensor size to unpack (example : 12635f )')

    args = parser.parse_args()
    if args.result_image_path:
        Path(args.result_image_path).mkdir(parents=True, exist_ok=True)
    main(args)

[!TIP]
エッジデバイスは、Output tensor を 8bit ではなく 32bit float で送信します。
そのため、Unpack する Output tensor サイズは、(feature の要素数 + probabilities の要素数)x4 となります。
inferenceresults_unpacked = struct.unpack(output_tensor_size,  inferenceresults_decoded)
エッジデバイスから送信する Output tensor サイズ制約は、「エッジアプリケーション実装要件」をご覧ください。

Console REST API のアクセス設定

下記をコピーして console_access_settings.yaml を作成し、Separated DETR 動作検証フォルダに置きます。

console_access_settings:
    console_endpoint: {Consoleエンドポイント}
    portal_authorization_endpoint: {Portalエンドポイント}
    client_secret: {シークレット}
    client_id: {クライアントID}
    device_id : {デバイスID}

各キーのバリューの取得については、それぞれ次のマニュアルを参照してください。

ConsoleエンドポイントとPortalエンドポイントは、「PortalおよびConsoleのエンドポイント情報」で確認します。

シークレットとクライアントIDは、「Portalユーザーマニュアル」の「クライアントアプリ用のClient Secretを発行する」に従って確認します。

デバイスIDは、「Console V2ユーザーマニュアル」の「3.1.4. エッジデバイス情報の確認」をご覧ください。

あわせて、「Console REST API V2 ユーザーガイド」の「Console REST API V2 のアクセストークン取得と使用」もご覧ください。

Output tensor 配列の確認について

エッジデバイスからは、 Output Tensor は1次元配列として送信されます。
コードで定義していた本来の Output tensor 配列との関係は、Console での Convert 後に、Console REST API の GetDnnParams で Tensor 情報を取得して確認します。

上記の validate_with_edge_device.py は、Tensor 情報に基づいて Output Tensor 配列を確認して実装していますので、基本的には問題ないはずです。
しかし、時間があれば念のためにOutput Tensor 配列を確認することをお勧めします。
GetDnnParams を用いた Output Tensor 配列確認については、「エッジアプリケーション実装ガイド」の「Edge Application 実装での Output tensor 配列確認」をご覧ください。

GetDnnParams で Tensor 情報を取得すると、dnnParams.xml にその情報が入って返ってきます。
今回の場合、下記の内容になっているはずです。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dnnParams>
    <networks>
        <network name="7d495351-9159-49a9-acf8-c5f0d71980f1-0" ordinal="0" type="">
            <inputTensors>
                <inputTensor persistency="1" ordinal="0" name="Placeholder.input.uid1:0" l2Offset="3900160" numOfDimensions="3" bitsPerElement="8" shift="0" scale="0.00390625" format="unsigned">
                    <dimensions>
                        <dimension size="3" serializationOrder="2" ordinal="0" padding="0"/>
                        <dimension size="224" serializationOrder="1" ordinal="1" padding="0"/>
                        <dimension size="224" serializationOrder="0" ordinal="2" padding="0"/>
                    </dimensions>
                </inputTensor>
            </inputTensors>
            <outputTensors>
                <outputTensor ordinal="1" name="transform-13-/flatten_1/Flatten:0" l2Offset="4055808" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.0625" format="signed">
                    <dimensions>
                        <dimension size="12544" serializationOrder="0" ordinal="0" padding="0"/>
                    </dimensions>
                </outputTensor>
                <outputTensor ordinal="0" name="transform-2-/backbone_classifier_1/layer/Gemm:0" l2Offset="4069120" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.00390625" format="unsigned">
                    <dimensions>
                        <dimension size="91" serializationOrder="0" ordinal="0" padding="0"/>
                    </dimensions>
                </outputTensor>
            </outputTensors>
        </network>
    </networks>
    <l2memory totalSize="8388480" coefficientsSize="2814080" reservedMemorySize="1024" networksRuntimeSize="1674240"/>
</dnnParams>

dnnParams.xmlの下記の行からは、Classifier 出力(probabilities) が１次元配列の先頭に、Classifier 付き backbone の Convolution 出力(feature) がその後に記述されると判断できます。

<outputTensor ordinal="1" name="transform-13-/flatten_1/Flatten:0" l2Offset="4055808" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.0625" format="signed">
...
</outputTensor>
<outputTensor ordinal="0" name="transform-2-/backbone_classifier_1/layer/Gemm:0" l2Offset="4069120" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.00390625" format="unsigned">
...
</outputTensor>

もし取得した dnnParams.xml が上記内容と異なる場合、validate_with_edge_device.py 内の下記コードを修正してください。
1次元配列から Output tensor 配列を復元しているコードです。

    probabilities = np.array( decoded_tensor[:91] )
    features = np.array( decoded_tensor[91:] )
    features = features.reshape(1,256,7,7)

推論の実行

Separated DETR 動作検証フォルダに、sample_edge_app_passthrough_wasm_v2_1.1.6.zip 内の edge_app_passthrough_configuration.json を置いてください。

validate_with_edge_device.py を実行すると、-result_image_path オプションで指定したフォルダに、バウンディングボックス、クラスID、確率が書き込まれた結果画像が保存されます。

実験等で、dim_feedforward や hidden_dim　などの Transformer のパラメータを変更した場合、オプションでその値を設定してください。
またクラス数を変更した場合には、--output_tensor_size で、Output tensor のバイトサイズの (feature の要素数 + probabilities の要素数)x4 を設定してください。

python validate_with_edge_device.py --device cpu

検出精度が若干低下しているようですが、全体として期待通りの動作をしています。
サイズの大きい物体の画像ばかりですが、ボケやノイズといった妨害に対して耐性があるのはさすがだと思いました。

	良い検出１	良い検出２	多重検出	誤検出
ファイル名	000000000081.jpg	000000000394.jpg	000000001282.jpg	000000025316.jpg
検出結果

著作権に配慮し、COCO画像を隠した形で掲載しています。

困った時は

もし、記事の途中でうまくいかなかった場合は、気軽にこの記事にコメントいただいたり、以下のサポートサイトもご覧ください。
コメントのお返事にはお時間を頂く可能性もありますがご了承ください。

Support Site

また、記事の内容以外で AITRIOS についてお困りごとなどあれば以下よりお問い合わせください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up