0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Run NanoDet on WebGL

Last updated at Posted at 2023-12-04

This article is for the 8th day of the UMITRON Advent Calendar 2022.

Japanese page

Introduction

At UMITRON, we utilize object detection in multiple products. Depending on the application requirements, we may need to run it in a browser using WebGL. In this article, I'll share insights gained from running the relatively recent object detection algorithm NanoDet with TensorFlow.js in WebGL.

About NanoDet

NanoDet is a fast object detection algorithm, offers several models. For CPU inference, NanoDet-Plus, which uses ShuffleNetV2 as the backbone, appears promising in terms of both accuracy and speed. However, since we want to use GPU in this case, we selected NanoDet-g, which uses Custom CSP Net as the backbone.

I compared the processing time using an NVIDIA Quadro P2000. I used a 40-frame video, measuring the average time taken for one frame of processing. The time measurement includes not only the inference time of the network but also preprocessing and post-processing. Additionally, I compared it with MobileNet SSD, which is one of the fast object detection methods. The results are as follows:

Method Processing Time [msec.]
MobileNet SSD 106
NanoDet-Plus 203
NanoDet-g 79

From the results, when inferring with GPU, NanoDet-g is more than twice as fast as NanoDet-Plus. Moreover, it is about 1.3 times faster compared to MobileNet SSD.

Model Training

I followed the instructions in the How to Train section of the README. The annotation for training data supports VOC XML format and MS COCO format. The default configuration file for NanoDet-g can be found here. After editing the configuration file, you can train the model by executing the train.py as follows:

python tools/train.py [config file path]

Model Conversion

There are several ways to run CNN in WebGL, but considering the compatibility with the application we wanted to incorporate NanoDet into, we chose TensorFlow.js this time. Since NanoDet is implemented in PyTorch, it is necessary to convert the pretrained model to TensorFlow.js. The steps are as follows:

1. Convert from PyTorch to ONNX

As mentioned in the README, you can use export_onnx.py for conversion. However, you need to lower the ONNX operator set version to 10 to avoid errors during the subsequent conversion to TensorFlow. Change the opset_version in export_onnx.py from 11 to 10, and then perform the conversion as follows:

python tools/export_onnx.py --cfg_path config/legacy_v0.x_configs/nanodet-g.yml --model_path [network ckpt path]

2. Convert from ONNX to TensorFlow

Use onnx-tf for conversion:

onnx-tf convert -i [onnx file path] -o [output TensorFlow model directory]

3. Convert from TensorFlow to TensorFlow.js

Use the tensorflowjs_converter included in tensorflowjs. At this point, to prevent overflow due to TensorFlow.js not supporting int64, you need to cast to int32 (reference).

tensorflowjs_converter --input_format tf_saved_model --output_format tfjs_graph_model --strip_debug_ops=* --weight_shard_size_bytes 18388608 [input TensorFlow model directory] [output TensorFlow.js model directory]

Inference with TensorFlow.js

The model converted using the above steps takes resized and normalized images as input and outputs a tensor containing class likelihood and rectangle positions mixed. Therefore, pre-processing and post-processing must be implemented separately. For reference, I'll describe the implementation details of preprocessing and post-processing by the authors.

Preprocessing: Image Normalization

The image is normalized by mean and variance for BGR images (relevant part). The image size is 416x416 for NanoDet-g.

Post-processing: Calculation of Bounding Box Positions and Class Likelihood

NanoDet divides the image into grid cells in a manner similar to YOLO for estimation. Therefore, the output is class likelihood and bounding box positions in each grid cell, and the shape of the tensor is [number of grid cells, number of classes + output rectangle number * 4]. The function used here converts this tensor to rectangles for each class. To understand the contents of the tensor a bit more, I'll explain it briefly here.

NanoDet divides the image into grid cells at multiple scales, and the number of grid cells depends on the resolution of the input image specified during training and the number of divisions of grid cells at each scale. In the case of NanoDet-g, you can specify the resolution of the input image with config's input_size and the number of divisions of grid cells with config's strides. With the default settings, the number of grid cells for each scale is (416 / 8)^2 = 2704, (416 / 16)^2 = 676, (416 / 32)^2 = 169, and the total is 3549. Also, you can specify the number of classes with config's num_classes, and the output number of rectangles with config's reg_max. Note that reg_max + 1 rectangles are output. The position of the bounding box is stored as the distance from the center of each grid cell to the left, top, right, and bottom of the bounding box as likelihood. In the case of NanoDet-Plus, please note that it is not from the center position but from the upper left of the grid cell. Therefore, for reference, the function get_single_level_center_point() used in post-processing should be based on NanoDet, not NanoDet-Plus.

Side Note

Another fast object detection algorithm similar to NanoDet is PicoDet. When exporting to ONNX, there is an option to include post-processing in the model, reducing the implementation cost, but I gave up because some operators are not supported when running on ONNX Runtime Web (including post-processing when converting from ONNX to TensorFlow was also not possible). The support status of operators in ONNX Runtime Web can be checked here. Currently, at least HardSigmoid is not supported, but it may be available in the future.

Also, since the backbone of PicoDet is an improved version of ShuffleNet called Enhance ShuffleNet, there may not be much difference in inference time on GPU compared to CPU.


UMITRON is currently recruiting colleagues to work together. Under the mission of implementing sustainable aquaculture on Earth, why not join us in tackling aquaculture x technology?

UMITRON Careers

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?