More than 3 years have passed since last update.

ESP32-CAMとTensorflow.jsでスマホからお手軽Object Detection

Last updated at 2021-05-28Posted at 2021-05-23

はじめに

ESP32-CAMを使って物体認識させる、っていう面白そうな記事を見つけたので、自分でもやってみた。
- 他にも「ESP32-CAM単体でTensorflow LiteでFashion Mnistを動かす」ってのもあってやってみたけど、流石に単体では制約がキツい。
- こっちはESP32-CAMを動画配信ページを提供するWEBサーバとして使い、検出は接続したスマホやPC側で行うので様々な物体が認識できる。
- 使うモデルはCOCO-SSD model。
- 人、乗り物、動物など80クラスの分類ができるらしい。

VSCode & PlatformIOでプロジェクトを作成

ボード名はAI Thinker ESP32-CAM
platformio.iniは以下で作成

[env:esp32cam]
platform = espressif32
board = esp32cam
framework = arduino

ESP32-CAMと接続

USB/TTLシリアルコンバーターを使った接続方法などは、以下の記事を参照。
- 約800円のESP32-CAMで乾電池駆動のWebカメラサーバーを立てる

HTML&Javascript

何故かそのままだとsafariでもchromeでもJavascriptが動かなかったので、何箇所か自分で書き換えた。
やってることはこんな感じ。
- TensorflowのJSライブラリのロード
- COCO-SSDモデルのロード
- ESP32-CAMのビデオストリームへのモデル適用と結果の表示

ts.html


<!DOCTYPE html>
<html lang="en">
  <head>
    <title>
      Multiple object detection using pre trained model in TensorFlow.js
    </title>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style>
        img {
            display: block;
        }

        .camView {
            position: relative;
            float: left;
            width: calc(100% - 20px);
            margin: 10px;
            cursor: pointer;
        }

        .camView p {
            position: absolute;
            padding: 5px;
            background-color: rgba(255, 111, 0, 0.85);
            color: #FFF;
            border: 1px dashed rgba(255, 255, 255, 0.7);
            z-index: 2;
            font-size: 12px;
        }

        .highlighter {
            background: rgba(0, 255, 0, 0.25);
            border: 1px dashed #fff;
            z-index: 1;
            position: absolute;
        }
    </style>
  </head>
  <body>
    <h1>ESP32 TensorFlow.js - Object Detection</h1>

    <p>
      Wait for the model to load before clicking the button to enable the webcam
      - at which point it will become visible to use.
    </p>

    <section id="camSection">
      <div id="liveView" class="camView">
        <button id="esp32camButton">Start ESP32 Webcam</button>
        <img id="esp32cam_video" width="640" height="480" crossorigin=" " />
      </div>
    </section>

    <!-- Import TensorFlow.js library -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js"></script>
    <!-- Load the coco-ssd model to use to recognize things in images -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>

    <script>
        const video = document.getElementById('esp32cam_video');
        esp32camButton.addEventListener('click', enableCam);
        esp32camButton.disabled = true;

        window.onload = function() {
            // alert('load model');
            cocoSsd.load().then( 
                (data) => { 
                    model = data;
                    esp32camButton.disabled = false;
                    alert('MODEL LOADED.');
                },
                (e) => { alert(e); }
            );
        };

        function enableCam(event) {
            video.src = 'http://' + window.location.hostname + ":81/";
            if (!model) {
                alert('No model..');
                return;
            } 
            predictWebcam();
        }

        var children = [];

        function predictWebcam() {
            // Now let's start classifying a frame in the stream.
            video.crossorigin = ' ';

            model.detect(video).then(
                predictions => {
                    // Remove any highlighting we did previous frame.
                    for (let i = 0; i < children.length; i++) {
                        liveView.removeChild(children[i]);
                    }
                    children.splice(0);

                    // Now lets loop through predictions and draw them to the live view if
                    // they have a high confidence score.
                    for (let n = 0; n < predictions.length; n++) {

                        console.log(predictions[n].class + ' ' + predictions[n].score);
                        if (predictions[n].score > 0.55) {
                            const p = document.createElement('p');
                            p.innerText = predictions[n].class  + ' - with '
                                + Math.round(parseFloat(predictions[n].score) * 100)
                                + '% confidence.';
                            p.style = 'margin-left: ' + predictions[n].bbox[0] + 'px; margin-top: '
                                + (predictions[n].bbox[1] - 10) + 'px; width: '
                                + (predictions[n].bbox[2] - 10) + 'px; top: 0; left: 0;';

                            const highlighter = document.createElement('div');
                            highlighter.setAttribute('class', 'highlighter');
                            highlighter.style = 'left: ' + predictions[n].bbox[0] + 'px; top: '
                                + predictions[n].bbox[1] + 'px; width: '
                                + predictions[n].bbox[2] + 'px; height: '
                                + predictions[n].bbox[3] + 'px;';

                            liveView.appendChild(highlighter);
                            liveView.appendChild(p);
                            children.push(highlighter);
                            children.push(p);
                        }
                    }
                },
                (e) => {
                    alert(e);
                }
            );

            // Call this function again to keep predicting when the browser is ready.
            window.requestAnimationFrame(predictWebcam);
        }
    </script>
  </body>
</html>

ソースコードに貼り付けるため単一行の文字列に変換

上記のコードをCの文字列変数としてコードに貼り付けるため、以下のスクリプトで余分な空白、コメント行、改行を除去。
実行結果として吐かれた文字列を後述のmain.cppに貼り付け。

con.rb

# trim comment, space, EOL from html file.
# paste to index_html[] in main.cpp
fname = "ts.html"
buf = ""
with open(fname,"r") as file:
    for i in file:
        strbuf = i.replace("  ", "")
        strbuf2 = strbuf.replace("\n", "")
        if not strbuf2.startswith("//"):
            buf = buf + strbuf2
print(buf)

`main.cpp`のコード

const char index_html[] PROGMEMには上記HTML文字列を貼り付け
以下は自分の環境で書き換え
- const char* ssid = "your_wifi_ssid";
- const char* password = "your_wifi_password";

main.cpp


# include <Arduino.h>
# include <WiFi.h>
# include "esp_http_server.h"
# include "esp_timer.h"
# include "esp_camera.h"
# include "img_converters.h"

# define CAMERA_MODEL_AI_THINKER
# include "camera_pins.h"

// HTML Page
const char index_html[] PROGMEM = R"rawliteral(
<!DOCTYPE html><html lang="en"><head><title>Multiple object detection using pre trained model in TensorFlow.js</title><meta charset="utf-8" /><meta http-equiv="X-UA-Compatible" content="IE=edge" /><meta name="viewport" content="width=device-width, initial-scale=1" /><style>img {display: block;}.camView {position: relative;float: left;width: calc(100% - 20px);margin: 10px;cursor: pointer;}.camView p {position: absolute;padding: 5px;background-color: rgba(255, 111, 0, 0.85);color: #FFF;border: 1px dashed rgba(255, 255, 255, 0.7);z-index: 2;font-size: 12px;}.highlighter {background: rgba(0, 255, 0, 0.25);border: 1px dashed #fff;z-index: 1;position: absolute;}</style></head><body><h1>ESP32 TensorFlow.js - Object Detection</h1><p>Wait for the model to load before clicking the button to enable the webcam- at which point it will become visible to use.</p><section id="camSection"><div id="liveView" class="camView"><button id="esp32camButton">Start ESP32 Webcam</button><img id="esp32cam_video" width="640" height="480" crossorigin=" " /></div></section><!-- Import TensorFlow.js library --><script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js"></script><!-- Load the coco-ssd model to use to recognize things in images --><script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script><script>const video = document.getElementById('esp32cam_video');esp32camButton.addEventListener('click', enableCam);esp32camButton.disabled = true;window.onload = function() {cocoSsd.load().then( (data) => { model = data;esp32camButton.disabled = false;alert('MODEL LOADED.');},(e) => { alert(e); });};function enableCam(event) {video.src = 'http://' + window.location.hostname + ":81/";if (!model) {alert('No model..');return;} predictWebcam();}var children = [];function predictWebcam() {video.crossorigin = ' ';model.detect(video).then(predictions => {for (let i = 0; i < children.length; i++) {liveView.removeChild(children[i]);}children.splice(0);for (let n = 0; n < predictions.length; n++) {console.log(predictions[n].class + ' ' + predictions[n].score);if (predictions[n].score > 0.55) {const p = document.createElement('p');p.innerText = predictions[n].class+ ' - with '+ Math.round(parseFloat(predictions[n].score) * 100)+ '% confidence.';p.style = 'margin-left: ' + predictions[n].bbox[0] + 'px; margin-top: '+ (predictions[n].bbox[1] - 10) + 'px; width: '+ (predictions[n].bbox[2] - 10) + 'px; top: 0; left: 0;';const highlighter = document.createElement('div');highlighter.setAttribute('class', 'highlighter');highlighter.style = 'left: ' + predictions[n].bbox[0] + 'px; top: '+ predictions[n].bbox[1] + 'px; width: '+ predictions[n].bbox[2] + 'px; height: '+ predictions[n].bbox[3] + 'px;';liveView.appendChild(highlighter);liveView.appendChild(p);children.push(highlighter);children.push(p);}}},(e) => {alert(e);});window.requestAnimationFrame(predictWebcam);}</script></body></html>
 )rawliteral";

# define PART_BOUNDARY "123456789000000000000987654321"


static const char* _STREAM_CONTENT_TYPE = "multipart/x-mixed-replace;boundary=" PART_BOUNDARY;
static const char* _STREAM_BOUNDARY = "\r\n--" PART_BOUNDARY "\r\n";
static const char* _STREAM_PART = "Content-Type: image/jpeg\r\nContent-Length: %u\r\n\r\n";

httpd_handle_t camera_httpd = NULL;
httpd_handle_t stream_httpd = NULL;

const char* ssid = "your_wifi_ssid";
const char* password = "your_wifi_password";

static esp_err_t page_handler(httpd_req_t *req) {
    httpd_resp_set_type(req, "text/html");
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
    httpd_resp_send(req, index_html, sizeof(index_html));
    return ESP_OK;
}

static esp_err_t stream_handler(httpd_req_t *req){
    camera_fb_t * fb = NULL;
    esp_err_t res = ESP_OK;
    size_t _jpg_buf_len = 0;
    uint8_t * _jpg_buf = NULL;
    char * part_buf[64];


    res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
    if(res != ESP_OK){
        return res;
    }

    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");

    while(true){
        fb = esp_camera_fb_get();
        if (!fb) {
            Serial.println("Camera capture failed");
            res = ESP_FAIL;
        } else {
            if(fb->format != PIXFORMAT_JPEG){
                bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);
                esp_camera_fb_return(fb);
                fb = NULL;
                if(!jpeg_converted){
                    Serial.println("JPEG compression failed");
                    res = ESP_FAIL;
                }
            } else {
                _jpg_buf_len = fb->len;
                _jpg_buf = fb->buf;
            }
        }
        if(res == ESP_OK){
            res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
        }
        if(res == ESP_OK){
            size_t hlen = snprintf((char *)part_buf, 64, _STREAM_PART, _jpg_buf_len);
            res = httpd_resp_send_chunk(req, (const char *)part_buf, hlen);
        }
        if(res == ESP_OK){
            res = httpd_resp_send_chunk(req, (const char *)_jpg_buf, _jpg_buf_len);
        }
        if(fb){
            esp_camera_fb_return(fb);
            fb = NULL;
            _jpg_buf = NULL;
        } else if(_jpg_buf){
            free(_jpg_buf);
            _jpg_buf = NULL;
        }
        if(res != ESP_OK){
            break;
        }
    }
    return res;
}

void startCameraServer(){
    httpd_config_t config = HTTPD_DEFAULT_CONFIG();

    httpd_uri_t index_uri = {
        .uri       = "/",
        .method    = HTTP_GET,
        .handler   = stream_handler,
        .user_ctx  = NULL
    };

    httpd_uri_t page_uri = {
        .uri       = "/ts",
        .method    = HTTP_GET,
        .handler   = page_handler,
        .user_ctx  = NULL
    };

    Serial.printf("Starting web server on port: '%d'\n", config.server_port);
    if (httpd_start(&camera_httpd, &config) == ESP_OK) {
      httpd_register_uri_handler(camera_httpd, &page_uri);
    }

    // start stream using another webserver
    config.server_port += 1;
    config.ctrl_port += 1;
    Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
    if (httpd_start(&stream_httpd, &config) == ESP_OK) {
        httpd_register_uri_handler(stream_httpd, &index_uri);
    }

}

void setup() {
    Serial.begin(9600);
    Serial.setDebugOutput(true);
    Serial.println();

    camera_config_t config;
    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    config.pin_d1 = Y3_GPIO_NUM;
    config.pin_d2 = Y4_GPIO_NUM;
    config.pin_d3 = Y5_GPIO_NUM;
    config.pin_d4 = Y6_GPIO_NUM;
    config.pin_d5 = Y7_GPIO_NUM;
    config.pin_d6 = Y8_GPIO_NUM;
    config.pin_d7 = Y9_GPIO_NUM;
    config.pin_xclk = XCLK_GPIO_NUM;
    config.pin_pclk = PCLK_GPIO_NUM;
    config.pin_vsync = VSYNC_GPIO_NUM;
    config.pin_href = HREF_GPIO_NUM;
    config.pin_sscb_sda = SIOD_GPIO_NUM;
    config.pin_sscb_scl = SIOC_GPIO_NUM;
    config.pin_pwdn = PWDN_GPIO_NUM;
    config.pin_reset = RESET_GPIO_NUM;
    config.xclk_freq_hz = 20000000;
    config.pixel_format = PIXFORMAT_JPEG;
    
    // if PSRAM IC present, init with UXGA resolution and higher JPEG quality
    //                      for larger pre-allocated frame buffer.
    if(psramFound()){
        config.frame_size = FRAMESIZE_UXGA;
        config.jpeg_quality = 10;
        config.fb_count = 2;
    } else {
        config.frame_size = FRAMESIZE_UXGA;
        config.jpeg_quality = 12;
        config.fb_count = 1;
    }

# if defined(CAMERA_MODEL_ESP_EYE)
    pinMode(13, INPUT_PULLUP);
    pinMode(14, INPUT_PULLUP);
# endif

    // camera init
    esp_err_t err = esp_camera_init(&config);
    if (err != ESP_OK) {
        Serial.printf("Camera init failed with error 0x%x", err);
        return;
    }

    sensor_t * s = esp_camera_sensor_get();
    // initial sensors are flipped vertically and colors are a bit saturated
    if (s->id.PID == OV3660_PID) {
        s->set_vflip(s, 1); // flip it back
        s->set_brightness(s, 1); // up the brightness just a bit
        s->set_saturation(s, 0); // lower the saturation
    }
    // drop down frame size for higher initial frame rate
    s->set_framesize(s, FRAMESIZE_VGA);

# if defined(CAMERA_MODEL_M5STACK_WIDE)
    s->set_vflip(s, 1);
    s->set_hmirror(s, 1);
# endif

    WiFi.begin(ssid, password);

    while (WiFi.status() != WL_CONNECTED) {
        delay(500);
        Serial.print(".");
    }
    Serial.println("");
    Serial.println("WiFi connected");

    startCameraServer();

    Serial.print("Camera Ready! Use 'http://");
    Serial.print(WiFi.localIP());
    Serial.println("/ts' to connect");
}

void loop() {
    // put your main code here, to run repeatedly:
    delay(10);
}

`camera_pins.h`の追加

以下の内容でinclude/camera_pins.hファイルを作成。

camera_pins.h

# if defined(CAMERA_MODEL_WROVER_KIT)
# define PWDN_GPIO_NUM    -1
# define RESET_GPIO_NUM   -1
# define XCLK_GPIO_NUM    21
# define SIOD_GPIO_NUM    26
# define SIOC_GPIO_NUM    27

# define Y9_GPIO_NUM      35
# define Y8_GPIO_NUM      34
# define Y7_GPIO_NUM      39
# define Y6_GPIO_NUM      36
# define Y5_GPIO_NUM      19
# define Y4_GPIO_NUM      18
# define Y3_GPIO_NUM       5
# define Y2_GPIO_NUM       4
# define VSYNC_GPIO_NUM   25
# define HREF_GPIO_NUM    23
# define PCLK_GPIO_NUM    22

# elif defined(CAMERA_MODEL_ESP_EYE)
# define PWDN_GPIO_NUM    -1
# define RESET_GPIO_NUM   -1
# define XCLK_GPIO_NUM    4
# define SIOD_GPIO_NUM    18
# define SIOC_GPIO_NUM    23

# define Y9_GPIO_NUM      36
# define Y8_GPIO_NUM      37
# define Y7_GPIO_NUM      38
# define Y6_GPIO_NUM      39
# define Y5_GPIO_NUM      35
# define Y4_GPIO_NUM      14
# define Y3_GPIO_NUM      13
# define Y2_GPIO_NUM      34
# define VSYNC_GPIO_NUM   5
# define HREF_GPIO_NUM    27
# define PCLK_GPIO_NUM    25

# elif defined(CAMERA_MODEL_M5STACK_PSRAM)
# define PWDN_GPIO_NUM     -1
# define RESET_GPIO_NUM    15
# define XCLK_GPIO_NUM     27
# define SIOD_GPIO_NUM     25
# define SIOC_GPIO_NUM     23

# define Y9_GPIO_NUM       19
# define Y8_GPIO_NUM       36
# define Y7_GPIO_NUM       18
# define Y6_GPIO_NUM       39
# define Y5_GPIO_NUM        5
# define Y4_GPIO_NUM       34
# define Y3_GPIO_NUM       35
# define Y2_GPIO_NUM       32
# define VSYNC_GPIO_NUM    22
# define HREF_GPIO_NUM     26
# define PCLK_GPIO_NUM     21

# elif defined(CAMERA_MODEL_M5STACK_WIDE)
# define PWDN_GPIO_NUM     -1
# define RESET_GPIO_NUM    15
# define XCLK_GPIO_NUM     27
# define SIOD_GPIO_NUM     22
# define SIOC_GPIO_NUM     23

# define Y9_GPIO_NUM       19
# define Y8_GPIO_NUM       36
# define Y7_GPIO_NUM       18
# define Y6_GPIO_NUM       39
# define Y5_GPIO_NUM        5
# define Y4_GPIO_NUM       34
# define Y3_GPIO_NUM       35
# define Y2_GPIO_NUM       32
# define VSYNC_GPIO_NUM    25
# define HREF_GPIO_NUM     26
# define PCLK_GPIO_NUM     21

# elif defined(CAMERA_MODEL_AI_THINKER)
# define PWDN_GPIO_NUM     32
# define RESET_GPIO_NUM    -1
# define XCLK_GPIO_NUM      0
# define SIOD_GPIO_NUM     26
# define SIOC_GPIO_NUM     27

# define Y9_GPIO_NUM       35
# define Y8_GPIO_NUM       34
# define Y7_GPIO_NUM       39
# define Y6_GPIO_NUM       36
# define Y5_GPIO_NUM       21
# define Y4_GPIO_NUM       19
# define Y3_GPIO_NUM       18
# define Y2_GPIO_NUM        5
# define VSYNC_GPIO_NUM    25
# define HREF_GPIO_NUM     23
# define PCLK_GPIO_NUM     22

# elif defined(CAMERA_MODEL_TTGO_T_JOURNAL)
# define PWDN_GPIO_NUM      0
# define RESET_GPIO_NUM    15
# define XCLK_GPIO_NUM     27
# define SIOD_GPIO_NUM     25
# define SIOC_GPIO_NUM     23

# define Y9_GPIO_NUM       19
# define Y8_GPIO_NUM       36
# define Y7_GPIO_NUM       18
# define Y6_GPIO_NUM       39
# define Y5_GPIO_NUM        5
# define Y4_GPIO_NUM       34
# define Y3_GPIO_NUM       35
# define Y2_GPIO_NUM       17
# define VSYNC_GPIO_NUM    22
# define HREF_GPIO_NUM     26
# define PCLK_GPIO_NUM     21

# else
# error "Camera model not selected"
# endif

ビルド、アップロード

PlatformIOからBuildして、Upload And Monitorする。
Uploadする時は、ESP32-CAMのIO0とGNDをショートさせておくこと。

実行

IO0とGNDのショートを解除し、ESP32-CAMのリセットボタンを押下するとWEBサーバが起動する。
- シリアルモニタのCamera Ready! Use http://192.168.3.23/ts to connectの行を確認。
- 自分の環境ではIPアドレスは192.168.3.23になった。
ブラウザから上記URL http://<esp32-cam-ip>/tsに接続。
- 画面表示後は、モデルのダウンロードにしばらく時間がかかるので待機。
- ダウンロードが終わると、Start ESP32 Webcamのボタンがアクティブになるので押下。
- これでカメラ映像の受信が始まり、物体検出が始まる。
- もちろん、スマホからでもアクセスできる。（モデルのデータサイズが大きいので注意）

最後に

ESP32-CAMをカメラサーバとして使い、スマホやPCのブラウザで検出をするお手軽な構成が実現できた。
Javascript側で検出に応じたアクションを記述すれば、いろいろできそうな感じ。
HTMLのvideo.src = 'http://' + window.location.hostname + ":81/"; を書き換えれば他のカメラでも使える。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

ESP32-CAMとTensorflow.jsでスマホからお手軽Object Detection

はじめに

VSCode & PlatformIOでプロジェクトを作成

ESP32-CAMと接続

HTML&Javascript

ソースコードに貼り付けるため単一行の文字列に変換

main.cppのコード

camera_pins.hの追加

ビルド、アップロード

実行

最後に

`main.cpp`のコード

`camera_pins.h`の追加