Running DETR's Backbone on Edge Devices and Transformer on PC

Last updated at 2025-09-08Posted at 2025-09-08

Running DETR's Backbone on Edge Devices and Transformer on PC

Overview of this Article

This is Matsuoka/Orita from the customer support team for AITRIOS at Sony Semiconductor Solutions.

In the previous article, we separated the Backbone and the Transformer of the DETR network and verified end-to-end object detection functionality on a PC.

We will replace the Backbone with MobileNetV2 and train it, then separate the network into two parts: the Backbone and the Transformer.
Furthermore, we will add a Classifier to the Backbone and perform transfer learning for a classification model.
Finally, we will use Python to verify the object detection operation using two models: the Backbone with the added Classifier and the separated Transformer.

In this article, we will verify the operation using an actual Edge Device with the Console Developer Edition of AITRIOS.

We will quantize the backbone with class classification on a PC and deploy the quantized model to the Edge Device from the Console.
We will configure the image capture settings on the Edge Device.
We will implement Python code to periodically retrieve the output tensor from the Console and perform object detection when the probability of any class is high.
We will capture images on the Edge Device and verify the system's operation from end to end.

If you're reading this article first, please also read the previous article.
To try the content of this article, you need to have completed at least up to the "Adding a Classifier Layer to the Backbone and Training for Class Classification" section in the previous article.

Additionally, a subscription to the Console Developer Edition of AITRIOS is required.
If you're not familiar with AITRIOS, please also check this website.
Note that the Console Developer Edition of AITRIOS is a service for corporate customers.

Site for AITRIOS: https://www.aitrios.sony-semicon.com/
Developer Site for AITRIOS: https://developer.aitrios.sony-semicon.com/en

If you notice any errors or omissions in this article, or if you have any questions, please leave a comment on the article.
Please be aware that it may take some time to respond to comments, and in some cases, we may not be able to provide a response.
This article is intended to introduce an application example and does not guarantee the performance or quality when implemented in practice.
We have not conducted any third-party patent searches.
For any issues related to AITRIOS, please refer to the AITRIOS support page.

Before proceeding with this article

About Console Developer Edition

In this article, we will use the Console V2 to operate the Edge Device.

If you are using the Edge Device with Console V1, please refer to the Edge Firmware Migration Guide for Console V2.
If you are using newly purchased Edge Device, please refer to the Device Setup Guide to update the Edge Firmware to the firmware for Console V2.
About the Edge Firmware for the Edge Device

Please update the Edge Firmware of your Edge Device to the latest version.
About the Folder to Use

For the implementation in this article, we will continue to use the feature input implementation folder, which is a copy of the cloned DETR folder, as we did in the previous article.
From here on, we'll refer to this folder as the 'Separated DETR Verification Folder'.
We will not be using the folder cloned for training.
About Adding Libraries

Please install the following libraries:
```
pip install requests
pip install Pillow
```

Modifying the output tensor shape of backbone with classification

To convert the Output Tensor generated by the Edge Device into a one-dimensional array, add a Flatten layer to the final layer of the backbone.

Although only two lines were added to the mobilenet.py created in the previous article, please place the following mobilenet_for_device.py in the Separated DETR Verification Folder.

mobilenet_for_device.py

import torch
import torch.nn.functional as F
import torchvision
from torch import nn
from torchvision.models._utils import IntermediateLayerGetter

import numpy as np
from torchvision.models.mobilenet import mobilenet_v2

class mobilenet_with_feature_output(nn.Module):

    def __init__(self, num_of_classes : int):
        super().__init__()

        self.backbone = mobilenet_v2(weights='IMAGENET1K_V1')

        self.backbone.classifier[1] = nn.Linear(in_features=1280, out_features=num_of_classes)
        self.backbone.classifier = nn.Sequential(
                        self.backbone.classifier[0],
                        self.backbone.classifier[1],
                        nn.Sigmoid()
                    )

        layer = dict([*self.backbone.named_modules()])['features.18']
        layer.register_forward_hook(self.hook_fn)

        for name, parameter in self.backbone.named_parameters():
            if name.startswith('classifier'):
                parameter.requires_grad_(True)
            else:
                parameter.requires_grad_(False)

        num_channels_moboinet=1280
        self.num_channels = 256

        self.resize = torch.nn.Conv2d(in_channels=num_channels_moboinet,out_channels=self.num_channels,kernel_size=(1,1),bias=False)

        self.flatten  = torch.nn.Flatten()

        step = int(num_channels_moboinet/self.num_channels)
        weight = np.array( [[0 if i<j or (j+step-1)<i else 1 for i in range(num_channels_moboinet) ] for j in range(0,num_channels_moboinet, step) ] , dtype = 'float32' )
        weight = weight.reshape(self.num_channels,num_channels_moboinet,1,1)
        self.resize.weight = nn.Parameter(torch.from_numpy(weight))
        self.resize.requires_grad = False

    def hook_fn(self, module, input, output):
        global intermediate_output
        intermediate_output = output

    def forward(self, tensors):

        y = self.backbone(tensors)
        feature  = self.resize(intermediate_output)
        feature  = self.flatten(feature)

        return y, feature

Quantization of Backbone with Class Classification and Deployment to Edge Device

We will follow the basic steps outlined in the PyTorch Model Deployment Guide.

Instead of Post-training Quantization, we use Gradient-Based Post-Training Quantization for quantization.

Additionally, to obtain the output tensor from the Edge Device, we deploy a Paththrough Edge Application.

To deploy PyTorch AI models to the Edge Device, you need to convert the floating-point model to an 8-bit integer model using the Model Compression Toolkit (MCT).
MCT is an Apache-2.0 licensed open-source software that runs on Python and provides quantization based on Post-training quantization (PTQ).

PTQ determines the value range (clipping range) of each tensor in a trained model based on actual dataset inputs.
Then it converts the model to an integer model by representing this clipping range with 8-bit integers.

This quantization calculation is called calibration.
For proper quantization, the calibration dataset needs to have a certain level of coverage for actual inputs.
Therefore, we perform calibration using the same dataset used for model training.

For more details, please refer to the MCT GitHub repository.
Note that after model quantization, MCT saves the model in ONNX format.

Executing Quantization

First, we will quantize the backbone with class classification created in the previous article using the Model Compression Toolkit (MCT).

Place the quantization.py file, which is at the end of this section, directly under the Separated DETR Verification Folder and execute it.

python quantization.py

This execution requires the Python library Model Compression Toolkit (MCT).
While the Dockerfile in the previous article includes the MCT installation, if you're using your own environment, please install model-compression-toolkit==2.0.0.
For the operating conditions of MCT version 2.0.0, refer to the Readme.md in the Version 2.0.0 release.

quantization.py

The code is based on the sample code in Model quantization in the PyTorch Model Deployment Guide .
Please also refer to the explanation of the code provided there.

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader

import model_compression_toolkit as mct
from model_compression_toolkit.core import QuantizationErrorMethod

from for_separation.mobilenet_for_device import mobilenet_with_feature_output
from for_separation.my_coco import CocoClassificationDataset

from torchvision.models import mobilenet_v2
import argparse


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--model_path', type=str, default='./backbone_with_classifier_weight.pth', help='The path to the keras model')
    parser.add_argument('--annotation_file', type=str, default='/data/image/coco/annotations/instances_train2017.json', help='The path to the annotation file')
    parser.add_argument('--image_folder', type=str, default='/data/image/coco/images/train2017', help='The path to the image folder')
    parser.add_argument('--quantized_model_path', type=str, default='separated_moblienet_quantized.onnx', help='The path to the quantized model')
    parser.add_argument('--num_of_classes', default=91, type=int,  help='the number of classes')
    args = parser.parse_args()

    batch_size = 32

    #<1>  Load a floating-point PyTorch model.
    model = mobilenet_with_feature_output(num_of_classes = args.num_of_classes)
    model.load_state_dict(torch.load(args.model_path, map_location=torch.device('cpu')))

    #<2> Load a calibration dataset for quantization.
    #    The calibration dataset is normalized to match the normalization used during training.
    train_dataset = CocoClassificationDataset(
        annotation_file = args.annotation_file,
        image_folder = args.image_folder,
        num_of_classes = args.num_of_classes,
        transform = transforms.Compose([
            transforms.Resize(size=(224,224)),
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.shape[0] == 1 else x)
        ])
    )

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=False)
    image_data_loader = iter(train_loader)

    #<3> Create a representative dataset generator
    n_iter=len(train_loader)
    def representative_data_gen() -> list:
        ds_iter = iter(train_loader)
        for _ in range(n_iter):
            yield [next(ds_iter)[0]]

    #<4> Set a configuration.
    q_config = mct.core.QuantizationConfig(activation_error_method=QuantizationErrorMethod.MSE,
                                       weights_error_method=QuantizationErrorMethod.MSE,
                                       weights_bias_correction=True,
                                       shift_negative_activation_correction=True,
                                       z_threshold=16)
    tpc = mct.get_target_platform_capabilities("pytorch", 'imx500', target_platform_version='v1')
    ptq_config = mct.core.CoreConfig(quantization_config=q_config)

    #<5> Quantize the floating-point PyTorch model to the 8-bit integer PyTorch model.
    quantized_model, quantization_info = mct.gptq.pytorch_gradient_post_training_quantization(model=model,
        representative_data_gen=representative_data_gen,
        core_config=ptq_config,
        target_platform_capabilities=tpc)

    #<6> Save the integer model as an ONNX model.
    mct.exporter.pytorch_export_model(model=quantized_model,
                                      save_model_path=args.quantized_model_path,
                                      repr_dataset=representative_data_gen)

Deployment to Edge Devices

Deploy the AI model and the Edge Application to the Edge Device using the Console V2.
We will follow the basic steps outlined in the PyTorch Model Deployment Guide.

Import the quantized model to the Console V2 and Convert.

For detailed steps, refer to 4.1.1 Import in the Console User Manual and Importing and converting the AI Model in the PyTorch Deployment Guide.
Download the Passthrough Edge Application from the GitHub for Edge Application SDK for AITRIOS.

As of July 2025, the latest version is sample_edge_app_passthrough_wasm_v2_1.1.6.zip, available at: https://github.com/SonySemiconductorSolutions/aitrios-sdk-edge-app/releases/tag/1.1.6.
Import the Edge Application into the Console V2.
For detailed steps, refer to 4.1.1 Import in the Console V2 User Manual and Importing the Edge Application in the PyTorch Model Deployment Guide.
Deploy the AI Model and Edge Application to the Edge Device.

For detailed steps, refer to in the SW Provisioning in the Console V2 User Manual and Deployment Operations in the PyTorch Model Deployment Guide.

Implementation of Verification Code

The overview of the code is as follows:

Retrieve Output tensors and images for a specified number of times.
If the classification probability is high for any class, execute object detection using the Transformer and save the image with the detection results drawn on it.

The code uses the Console REST API for Edge Device control and retrieval of Output tensors and images.
For more information on the Console REST API, please refer to the following:

Processing Flow of the Implementation Code

Build the Transformer model and load the weights.
Obtain the access token required to call the Console REST API.
Instruct the Edge Device to start capturing using the Console REST API.
Based on the local PC time, retrieve the latest inference results (Output Tensor) sent by the Edge Device from the Console via REST API.
Additionally, retrieve the images from the Console based on the timestamp of the inference results.
Base64 decode the output tensor, then unpack it. After that, restore the original arrays of features and probabilities.
Evaluate the classification probabilities, and if any are high, perform object detection with the Transformer.
If there is at least one detection result, save the resulting image with the bounding box drawn.
Exit the inference loop when the specified number of iterations is reached.
Instruct the Edge Device to stop capturing using the Console REST API.

Code

validate_with_edge_device.py

Place this new file directly under the Separated DETR Verification Folder.

import argparse
import json
import sys
import time
from pathlib import Path
import numpy as np
import torch
import base64
import struct
from PIL import Image
import cv2
import yaml
import requests
from models import build_model
from main import get_args_parser
from detect import detect, draw_boxes
import datetime

def load_settings_file(settings_file_path):
    with open(settings_file_path, "r", encoding="utf-8") as file:
        yaml_data = yaml.safe_load(file)
    portal_authorization_endpoint = yaml_data['console_access_settings']['portal_authorization_endpoint']
    client_secret = yaml_data['console_access_settings']['client_secret']
    client_id = yaml_data['console_access_settings']['client_id']
    console_endpoint = yaml_data['console_access_settings']['console_endpoint']
    device_id = yaml_data['console_access_settings']['device_id']
    return portal_authorization_endpoint,client_secret,client_id,console_endpoint,device_id

def get_access_token(portal_authorization_endpoint,client_secret,client_id):

    authorization = base64.b64encode((client_id + ':' + client_secret).encode()).decode()

    headers  = {'accept': 'application/json',
                'authorization': 'Basic {}'.format(str(authorization)),
                'cache-control': 'no-cache',
                'content-type': 'application/x-www-form-urlencoded'
                }
    data = 'grant_type=client_credentials&scope=system'
    response = requests.post(portal_authorization_endpoint, headers=headers, data=data)
    json_data = response.json()
    access_token = str(json_data['access_token'])
    return access_token

def get_device_modlue(console_endpoint,access_token,device_id):
    headers  = {'Authorization': 'Bearer {}'.format(access_token)}
    get_device_url = console_endpoint + '/devices/' + device_id
    response = requests.get(get_device_url, headers=headers)
    json_data = response.json()
    for module in json_data['modules']:
        if module['module_id'] != '$system' and module['property']['state'] and len(module['module_id']) > 0:
            return module['module_id']
    print("module_id null")
    sys.exit()

def update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param):
    headers  = {'Authorization': 'Bearer {}'.format(access_token)}
    update_configuration_url = console_endpoint + '/devices/' + device_id + '/modules/' + module_id + '/property'
    response = requests.patch(update_configuration_url, headers=headers, json=configuration_param)
    print(response.json())

def get_latest_data(console_endpoint,headers,device_id,start_time):

    target_time = datetime.datetime.now(tz=datetime.timezone.utc)
    target_time_str = target_time.strftime('%Y-%m-%dT%H:%M:%S.%f')
    get_inference_results_url = console_endpoint + '/inferenceresults?devices=' + device_id + '&limit=1&from_datetime=' + target_time_str

    while True:
        try :
            # Obtain the latest inference result (JSON) from the Console by filtering JSON with the target_time so that the time stamp in the JSON shall be later than target_time.
            response = requests.get(get_inference_results_url, headers=headers)
            inferenceresults = response.json()
            # Get the time stamp from the obtained JSON.
            timestamp = inferenceresults['inferences'][0]['inferences'][0]['T']
            timestamp = datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%f")
            timestamp = timestamp.strftime('%Y%m%d%H%M%S%f')[:-3]
            # Get the encoded output tensor from the obtained JSON.
            encoded_tensor = inferenceresults['inferences'][0]['inferences'][0]['O']
            print('\nGet inference : Success (timestamp : ' + str(timestamp) + ')')
            break

        except Exception as e:
            print('\rWaiting for inference : ' + str(e), end='')
            for i in range(10):
                time.sleep(0.05)

    while True:
        try :
            # Obtain the image folder names from the Console ane verify that the latest timestamp is later than the start time
            get_dir_name = console_endpoint + '/images/devices/directories?device_id=' + device_id
            response = requests.get(get_dir_name, headers=headers)
            json_data = response.json()
            sub_directory_name = json_data[0]['devices'][0]['Image'][-1]
            sub_directory_time = datetime.datetime.strptime(sub_directory_name, "%Y%m%d%H%M%S")
            sub_directory_time = sub_directory_time.replace(tzinfo=datetime.timezone.utc)
            if start_time > sub_directory_time:
                for i in range(10):
                    time.sleep(0.05)
                continue
            print('\nGet sub_directory_name : Success (sub_directory_name : ' + str(sub_directory_name) + ')')
            break

        except Exception as e:
            print('\rWaiting for get sub_directory_name : ' + str(e), end='')
            for i in range(10):
                time.sleep(0.05)

    while True:
        try :
            # Obtain the image to be sent to the Console.
            get_image = console_endpoint + '/images/devices/' + device_id + '/directories/' + sub_directory_name + '?limit=1&name_starts_with=' + timestamp
            response = requests.get(get_image, headers=headers)
            json_data = response.json()
            # Obtain the image from SAS URL.
            im_name = json_data['data'][0]['name']
            im_binary = requests.get(json_data['data'][0]['sas_url']).content
            jpg=np.frombuffer(im_binary,dtype=np.uint8)
            im = cv2.imdecode(jpg, cv2.IMREAD_COLOR)
            im = Image.fromarray(im)
            print('\nGet image : Success (image_name : ' + str(timestamp) + ')')
            break

        except Exception as e:
            print('\rWaiting for image : ' + str(e), end='')
            for i in range(10):
                time.sleep(0.05)

    return encoded_tensor,im,im_name

def get_output_tensors(output_tensor_size,encoded_tensor):

    encoded_tensor = base64.b64decode(encoded_tensor)
    decoded_tensor = struct.unpack(output_tensor_size,encoded_tensor)

    probabilities = np.array( decoded_tensor[:91] )
    features = np.array( decoded_tensor[91:] )
    features = features.reshape(1,256,7,7)
    return features,probabilities

def main(args):

    #<1> Create the transformer model and load weights for the model.
    device = torch.device(args.device)
    model, criterion, postprocessors = build_model(args)
    model.to(device)
    checkpoint = torch.load(args.transformer_path, map_location='cpu')
    model.load_state_dict(checkpoint)
    model.eval()

    configuration_json_open = open(args.configuration_file_path, 'r')
    configuration_json_load = json.load(configuration_json_open)
    configuration_param = {}
    configuration_param["configuration"] = (configuration_json_load)

    #<2> Get an access token required for calling console APIs.
    portal_authorization_endpoint,client_secret,client_id,console_endpoint,device_id = load_settings_file(args.settings_file_path)
    access_token = get_access_token(portal_authorization_endpoint,client_secret,client_id)
    headers  = {'Authorization': 'Bearer {}'.format(access_token)}

    #<3> Start inference on the Edge Device.
    start_time = datetime.datetime.now(tz=datetime.timezone.utc)
    module_id = get_device_modlue(console_endpoint,access_token,device_id)
    configuration_param["configuration"]["edge_app"]["common_settings"]["process_state"] = 1
    update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param)

    configuration_param["configuration"]["edge_app"]["common_settings"]["process_state"] = 2
    update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param)

    #get output_tensor and execute transformer
    execution_count = 0
    threshold = 0.8

    while True:

        #<4> Obtain encoded output tensor from the Edge Device and decode it.
        encoded_tensor,im,im_name = get_latest_data(console_endpoint,headers,device_id,start_time)
        #<5> Obtain classification probabilities and features from the output tensor.
        features,probabilities = get_output_tensors(args.output_tensor_size, encoded_tensor)

        if np.amax(probabilities[1:91]) > threshold :
            #<6> Execute transformer to detect objects.
            print(np.amax(probabilities[1:91]))

            features = torch.from_numpy(features.astype(np.float32)).clone()
            features.to(device)
            scores, boxes = detect(features , model, device, im.size)

            if scores.shape[0] > 0:
                #<7> Draw boundary box into the image (input tensor) and save it.
                im = np.array(im, dtype=np.uint8)
                mat_img = draw_boxes(im, scores, boxes, args.result_image_path)
                cv2.imwrite(args.result_image_path + ('/draw_') + im_name, mat_img)
                execution_count += 1

                print( execution_count )

        #<8> Exit the loop after the specified iterations.
        if execution_count >= args.num_executions :
            break

    #<9> Stop inference on the Edge Device.
    configuration_param["configuration"]["edge_app"]["common_settings"]["process_state"] = 1
    update_configuration(console_endpoint,access_token,device_id,module_id,configuration_param)

if __name__ == '__main__':

    parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])

    parser.add_argument('--result_image_path', type=str, default='./images', help='The path to an result image')
    parser.add_argument('--transformer_path', type=str, default='transformer.pth', help='The path to the transformer model')
    parser.add_argument('--settings_file_path', type=str, default='./console_access_settings.yaml', help='The path to the setting file for rest api')
    parser.add_argument('--configuration_file_path', type=str, default='./edge_app_passthrough_configuration.json', help='The path to the configuration json')
    parser.add_argument('--num_executions', default='10', type=int, help='Number of executions')
    parser.add_argument('--output_tensor_size', default='12635f', type=str, help='Output Tensor size to unpack (example : 12635f )')

    args = parser.parse_args()
    if args.result_image_path:
        Path(args.result_image_path).mkdir(parents=True, exist_ok=True)
    main(args)

[!TIP]
The Edge Device sends Output tensors in 32-bit float format, not 8-bit.
Therefore, the size of the Output tensor to unpack is (number of feature elements + number of probability elements) x 4.
inferenceresults_unpacked = struct.unpack(output_tensor_size, inferenceresults_decoded)
For constraints on the Output tensor size sent from the Edge Device, please refer to the "Edge Application Implementation Requirements".

Console REST API Access Settings

Copy the following and create a file named console_access_settings.yaml in the Separated DETR Verification Folder:

console_access_settings:
    console_endpoint: {Console endpoint}
    portal_authorization_endpoint: {Portal endpoint}
    client_secret: {Secret}
    client_id: {Client ID}
    device_id : {Device ID}

Refer to the respective manuals to obtain the values for each key.

For Console endpoint and Portal endpoint, check the Portal / Console Endpoint Information.

For Secret and Client ID, follow the Issuing a Client Secret for Client Applications section in the Portal User Manual.

For Device ID, refer to the 3.1.4. View Device Information section in the Console V2 User Manual.

Additionally, please refer to the Obtaining and using an access token of Console REST API V2 section in the Console REST API V2 User Guide.

About Verifying Output Tensor Arrays

Edge Devices transmit output tensors as one-dimensional arrays.
To understand the relationship with the originally defined output tensor array in the code, check the tensor information using the Console REST API GetDnnParams after Converting in the Console.

The validate_with_edge_device.py was implemented with the output tensor array verification based on tensor information, so it should generally work without issues. However, if time permits, we recommend double-checking the output tensor array.

For information about output tensor array verification using GetDnnParams, refer to the "Checking the Output tensor array in the Edge Application implementation" section in the "Edge Application Implementation Guide".

When you retrieve tensor information using GetDnnParams, the information is returned within a dnnParams.xml file. In this case, the content should be as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dnnParams>
    <networks>
        <network name="7d495351-9159-49a9-acf8-c5f0d71980f1-0" ordinal="0" type="">
            <inputTensors>
                <inputTensor persistency="1" ordinal="0" name="Placeholder.input.uid1:0" l2Offset="3900160" numOfDimensions="3" bitsPerElement="8" shift="0" scale="0.00390625" format="unsigned">
                    <dimensions>
                        <dimension size="3" serializationOrder="2" ordinal="0" padding="0"/>
                        <dimension size="224" serializationOrder="1" ordinal="1" padding="0"/>
                        <dimension size="224" serializationOrder="0" ordinal="2" padding="0"/>
                    </dimensions>
                </inputTensor>
            </inputTensors>
            <outputTensors>
                <outputTensor ordinal="1" name="transform-13-/flatten_1/Flatten:0" l2Offset="4055808" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.0625" format="signed">
                    <dimensions>
                        <dimension size="12544" serializationOrder="0" ordinal="0" padding="0"/>
                    </dimensions>
                </outputTensor>
                <outputTensor ordinal="0" name="transform-2-/backbone_classifier_1/layer/Gemm:0" l2Offset="4069120" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.00390625" format="unsigned">
                    <dimensions>
                        <dimension size="91" serializationOrder="0" ordinal="0" padding="0"/>
                    </dimensions>
                </outputTensor>
            </outputTensors>
        </network>
    </networks>
    <l2memory totalSize="8388480" coefficientsSize="2814080" reservedMemorySize="1024" networksRuntimeSize="1674240"/>
</dnnParams>

From the lines below in dnnParams.xml, we can determine that the Classifier output (probabilities) is described at the beginning of the one-dimensional array, followed by the Convolution output (feature) of the backbone with Classifier.

<outputTensor ordinal="1" name="transform-13-/flatten_1/Flatten:0" l2Offset="4055808" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.0625" format="signed">
...
</outputTensor>
<outputTensor ordinal="0" name="transform-2-/backbone_classifier_1/layer/Gemm:0" l2Offset="4069120" numOfDimensions="1" bitsPerElement="8" shift="0" scale="0.00390625" format="unsigned">
...
</outputTensor>

If the retrieved dnnParams.xml differs from the content shown above, please modify the following code in validate_with_edge_device.py.
This code reconstructs the Output tensor array from the one-dimensional array:

    probabilities = np.array( decoded_tensor[:91] )
    features = np.array( decoded_tensor[91:] )
    features = features.reshape(1,256,7,7)

Executing Inference

Place the edge_app_passthrough_configuration.json in the sample_edge_app_passthrough_wasm_v2_1.1.6.zip into the Separated DETR Verification Folder.

When you run validate_with_edge_device.py, the resulting image with bounding boxes, class IDs, and probabilities will be saved in the folder specified by the -result_image_path option.

If you've modified Transformer parameters such as dim_feedforward or hidden_dim for experimental purposes, please set these values using the appropriate options.
Additionally, if you've changed the number of classes, use the --output_tensor_size option to set the byte size of the Output tensor, which should be (number of feature elements + number of probability elements) x 4.

python validate_with_edge_device.py --device cpu

The detection accuracy appears to have slightly decreased, but overall, it is functioning as expected.
While the images mostly consist of large objects, I was impressed by its resistance to interference such as blur and noise.

	Correct Detection 1	Correct Detection 2	Multiple Detection	False Detection
File name	000000000081.jpg	000000000394.jpg	000000001282.jpg	000000025316.jpg
Detection results

In consideration of copyright, the COCO images are presented in a concealed manner.

When you need help

If you run into any issues while following this article, please don't hesitate to leave a comment on this article.
You can also visit our support website below for additional guidance.
Please note that it may take us a little time to respond to comments.
Thank you for your understanding.

Support Site

If you have any questions or concerns about AITRIOS that are not covered in this article, please contact us via the link below:

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up