5
7

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

ESP32-CAMとTensorflow.jsでスマホからお手軽Object Detection

Last updated at Posted at 2021-05-23

はじめに

スクリーンショット 2021-05-23 22.07.41.png

  • ESP32-CAMを使って物体認識させる、っていう面白そうな記事を見つけたので、自分でもやってみた。

VSCode & PlatformIOでプロジェクトを作成

  • ボード名はAI Thinker ESP32-CAM
  • platformio.iniは以下で作成
[env:esp32cam]
platform = espressif32
board = esp32cam
framework = arduino

ESP32-CAMと接続

HTML&Javascript

  • 何故かそのままだとsafariでもchromeでもJavascriptが動かなかったので、何箇所か自分で書き換えた。
  • やってることはこんな感じ。
    • TensorflowのJSライブラリのロード
    • COCO-SSDモデルのロード
    • ESP32-CAMのビデオストリームへのモデル適用と結果の表示
ts.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>
      Multiple object detection using pre trained model in TensorFlow.js
    </title>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style>
        img {
            display: block;
        }

        .camView {
            position: relative;
            float: left;
            width: calc(100% - 20px);
            margin: 10px;
            cursor: pointer;
        }

        .camView p {
            position: absolute;
            padding: 5px;
            background-color: rgba(255, 111, 0, 0.85);
            color: #FFF;
            border: 1px dashed rgba(255, 255, 255, 0.7);
            z-index: 2;
            font-size: 12px;
        }

        .highlighter {
            background: rgba(0, 255, 0, 0.25);
            border: 1px dashed #fff;
            z-index: 1;
            position: absolute;
        }
    </style>
  </head>
  <body>
    <h1>ESP32 TensorFlow.js - Object Detection</h1>

    <p>
      Wait for the model to load before clicking the button to enable the webcam
      - at which point it will become visible to use.
    </p>

    <section id="camSection">
      <div id="liveView" class="camView">
        <button id="esp32camButton">Start ESP32 Webcam</button>
        <img id="esp32cam_video" width="640" height="480" crossorigin=" " />
      </div>
    </section>

    <!-- Import TensorFlow.js library -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js"></script>
    <!-- Load the coco-ssd model to use to recognize things in images -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>

    <script>
        const video = document.getElementById('esp32cam_video');
        esp32camButton.addEventListener('click', enableCam);
        esp32camButton.disabled = true;

        window.onload = function() {
            // alert('load model');
            cocoSsd.load().then( 
                (data) => { 
                    model = data;
                    esp32camButton.disabled = false;
                    alert('MODEL LOADED.');
                },
                (e) => { alert(e); }
            );
        };

        function enableCam(event) {
            video.src = 'http://' + window.location.hostname + ":81/";
            if (!model) {
                alert('No model..');
                return;
            } 
            predictWebcam();
        }

        var children = [];

        function predictWebcam() {
            // Now let's start classifying a frame in the stream.
            video.crossorigin = ' ';

            model.detect(video).then(
                predictions => {
                    // Remove any highlighting we did previous frame.
                    for (let i = 0; i < children.length; i++) {
                        liveView.removeChild(children[i]);
                    }
                    children.splice(0);

                    // Now lets loop through predictions and draw them to the live view if
                    // they have a high confidence score.
                    for (let n = 0; n < predictions.length; n++) {

                        console.log(predictions[n].class + ' ' + predictions[n].score);
                        if (predictions[n].score > 0.55) {
                            const p = document.createElement('p');
                            p.innerText = predictions[n].class  + ' - with '
                                + Math.round(parseFloat(predictions[n].score) * 100)
                                + '% confidence.';
                            p.style = 'margin-left: ' + predictions[n].bbox[0] + 'px; margin-top: '
                                + (predictions[n].bbox[1] - 10) + 'px; width: '
                                + (predictions[n].bbox[2] - 10) + 'px; top: 0; left: 0;';

                            const highlighter = document.createElement('div');
                            highlighter.setAttribute('class', 'highlighter');
                            highlighter.style = 'left: ' + predictions[n].bbox[0] + 'px; top: '
                                + predictions[n].bbox[1] + 'px; width: '
                                + predictions[n].bbox[2] + 'px; height: '
                                + predictions[n].bbox[3] + 'px;';

                            liveView.appendChild(highlighter);
                            liveView.appendChild(p);
                            children.push(highlighter);
                            children.push(p);
                        }
                    }
                },
                (e) => {
                    alert(e);
                }
            );

            // Call this function again to keep predicting when the browser is ready.
            window.requestAnimationFrame(predictWebcam);
        }
    </script>
  </body>
</html>

ソースコードに貼り付けるため単一行の文字列に変換

  • 上記のコードをCの文字列変数としてコードに貼り付けるため、以下のスクリプトで余分な空白、コメント行、改行を除去。
  • 実行結果として吐かれた文字列を後述のmain.cppに貼り付け。
con.rb
# trim comment, space, EOL from html file.
# paste to index_html[] in main.cpp
fname = "ts.html"
buf = ""
with open(fname,"r") as file:
    for i in file:
        strbuf = i.replace("  ", "")
        strbuf2 = strbuf.replace("\n", "")
        if not strbuf2.startswith("//"):
            buf = buf + strbuf2
print(buf)

main.cppのコード

  • const char index_html[] PROGMEMには上記HTML文字列を貼り付け
  • 以下は自分の環境で書き換え
    • const char* ssid = "your_wifi_ssid";
    • const char* password = "your_wifi_password";
main.cpp

# include <Arduino.h>
# include <WiFi.h>
# include "esp_http_server.h"
# include "esp_timer.h"
# include "esp_camera.h"
# include "img_converters.h"

# define CAMERA_MODEL_AI_THINKER
# include "camera_pins.h"

// HTML Page
const char index_html[] PROGMEM = R"rawliteral(
<!DOCTYPE html><html lang="en"><head><title>Multiple object detection using pre trained model in TensorFlow.js</title><meta charset="utf-8" /><meta http-equiv="X-UA-Compatible" content="IE=edge" /><meta name="viewport" content="width=device-width, initial-scale=1" /><style>img {display: block;}.camView {position: relative;float: left;width: calc(100% - 20px);margin: 10px;cursor: pointer;}.camView p {position: absolute;padding: 5px;background-color: rgba(255, 111, 0, 0.85);color: #FFF;border: 1px dashed rgba(255, 255, 255, 0.7);z-index: 2;font-size: 12px;}.highlighter {background: rgba(0, 255, 0, 0.25);border: 1px dashed #fff;z-index: 1;position: absolute;}</style></head><body><h1>ESP32 TensorFlow.js - Object Detection</h1><p>Wait for the model to load before clicking the button to enable the webcam- at which point it will become visible to use.</p><section id="camSection"><div id="liveView" class="camView"><button id="esp32camButton">Start ESP32 Webcam</button><img id="esp32cam_video" width="640" height="480" crossorigin=" " /></div></section><!-- Import TensorFlow.js library --><script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js"></script><!-- Load the coco-ssd model to use to recognize things in images --><script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script><script>const video = document.getElementById('esp32cam_video');esp32camButton.addEventListener('click', enableCam);esp32camButton.disabled = true;window.onload = function() {cocoSsd.load().then( (data) => { model = data;esp32camButton.disabled = false;alert('MODEL LOADED.');},(e) => { alert(e); });};function enableCam(event) {video.src = 'http://' + window.location.hostname + ":81/";if (!model) {alert('No model..');return;} predictWebcam();}var children = [];function predictWebcam() {video.crossorigin = ' ';model.detect(video).then(predictions => {for (let i = 0; i < children.length; i++) {liveView.removeChild(children[i]);}children.splice(0);for (let n = 0; n < predictions.length; n++) {console.log(predictions[n].class + ' ' + predictions[n].score);if (predictions[n].score > 0.55) {const p = document.createElement('p');p.innerText = predictions[n].class+ ' - with '+ Math.round(parseFloat(predictions[n].score) * 100)+ '% confidence.';p.style = 'margin-left: ' + predictions[n].bbox[0] + 'px; margin-top: '+ (predictions[n].bbox[1] - 10) + 'px; width: '+ (predictions[n].bbox[2] - 10) + 'px; top: 0; left: 0;';const highlighter = document.createElement('div');highlighter.setAttribute('class', 'highlighter');highlighter.style = 'left: ' + predictions[n].bbox[0] + 'px; top: '+ predictions[n].bbox[1] + 'px; width: '+ predictions[n].bbox[2] + 'px; height: '+ predictions[n].bbox[3] + 'px;';liveView.appendChild(highlighter);liveView.appendChild(p);children.push(highlighter);children.push(p);}}},(e) => {alert(e);});window.requestAnimationFrame(predictWebcam);}</script></body></html>
 )rawliteral";

# define PART_BOUNDARY "123456789000000000000987654321"


static const char* _STREAM_CONTENT_TYPE = "multipart/x-mixed-replace;boundary=" PART_BOUNDARY;
static const char* _STREAM_BOUNDARY = "\r\n--" PART_BOUNDARY "\r\n";
static const char* _STREAM_PART = "Content-Type: image/jpeg\r\nContent-Length: %u\r\n\r\n";

httpd_handle_t camera_httpd = NULL;
httpd_handle_t stream_httpd = NULL;

const char* ssid = "your_wifi_ssid";
const char* password = "your_wifi_password";

static esp_err_t page_handler(httpd_req_t *req) {
    httpd_resp_set_type(req, "text/html");
    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
    httpd_resp_send(req, index_html, sizeof(index_html));
    return ESP_OK;
}

static esp_err_t stream_handler(httpd_req_t *req){
    camera_fb_t * fb = NULL;
    esp_err_t res = ESP_OK;
    size_t _jpg_buf_len = 0;
    uint8_t * _jpg_buf = NULL;
    char * part_buf[64];


    res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
    if(res != ESP_OK){
        return res;
    }

    httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");

    while(true){
        fb = esp_camera_fb_get();
        if (!fb) {
            Serial.println("Camera capture failed");
            res = ESP_FAIL;
        } else {
            if(fb->format != PIXFORMAT_JPEG){
                bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);
                esp_camera_fb_return(fb);
                fb = NULL;
                if(!jpeg_converted){
                    Serial.println("JPEG compression failed");
                    res = ESP_FAIL;
                }
            } else {
                _jpg_buf_len = fb->len;
                _jpg_buf = fb->buf;
            }
        }
        if(res == ESP_OK){
            res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
        }
        if(res == ESP_OK){
            size_t hlen = snprintf((char *)part_buf, 64, _STREAM_PART, _jpg_buf_len);
            res = httpd_resp_send_chunk(req, (const char *)part_buf, hlen);
        }
        if(res == ESP_OK){
            res = httpd_resp_send_chunk(req, (const char *)_jpg_buf, _jpg_buf_len);
        }
        if(fb){
            esp_camera_fb_return(fb);
            fb = NULL;
            _jpg_buf = NULL;
        } else if(_jpg_buf){
            free(_jpg_buf);
            _jpg_buf = NULL;
        }
        if(res != ESP_OK){
            break;
        }
    }
    return res;
}

void startCameraServer(){
    httpd_config_t config = HTTPD_DEFAULT_CONFIG();

    httpd_uri_t index_uri = {
        .uri       = "/",
        .method    = HTTP_GET,
        .handler   = stream_handler,
        .user_ctx  = NULL
    };

    httpd_uri_t page_uri = {
        .uri       = "/ts",
        .method    = HTTP_GET,
        .handler   = page_handler,
        .user_ctx  = NULL
    };

    Serial.printf("Starting web server on port: '%d'\n", config.server_port);
    if (httpd_start(&camera_httpd, &config) == ESP_OK) {
      httpd_register_uri_handler(camera_httpd, &page_uri);
    }

    // start stream using another webserver
    config.server_port += 1;
    config.ctrl_port += 1;
    Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
    if (httpd_start(&stream_httpd, &config) == ESP_OK) {
        httpd_register_uri_handler(stream_httpd, &index_uri);
    }

}

void setup() {
    Serial.begin(9600);
    Serial.setDebugOutput(true);
    Serial.println();

    camera_config_t config;
    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    config.pin_d1 = Y3_GPIO_NUM;
    config.pin_d2 = Y4_GPIO_NUM;
    config.pin_d3 = Y5_GPIO_NUM;
    config.pin_d4 = Y6_GPIO_NUM;
    config.pin_d5 = Y7_GPIO_NUM;
    config.pin_d6 = Y8_GPIO_NUM;
    config.pin_d7 = Y9_GPIO_NUM;
    config.pin_xclk = XCLK_GPIO_NUM;
    config.pin_pclk = PCLK_GPIO_NUM;
    config.pin_vsync = VSYNC_GPIO_NUM;
    config.pin_href = HREF_GPIO_NUM;
    config.pin_sscb_sda = SIOD_GPIO_NUM;
    config.pin_sscb_scl = SIOC_GPIO_NUM;
    config.pin_pwdn = PWDN_GPIO_NUM;
    config.pin_reset = RESET_GPIO_NUM;
    config.xclk_freq_hz = 20000000;
    config.pixel_format = PIXFORMAT_JPEG;
    
    // if PSRAM IC present, init with UXGA resolution and higher JPEG quality
    //                      for larger pre-allocated frame buffer.
    if(psramFound()){
        config.frame_size = FRAMESIZE_UXGA;
        config.jpeg_quality = 10;
        config.fb_count = 2;
    } else {
        config.frame_size = FRAMESIZE_UXGA;
        config.jpeg_quality = 12;
        config.fb_count = 1;
    }

# if defined(CAMERA_MODEL_ESP_EYE)
    pinMode(13, INPUT_PULLUP);
    pinMode(14, INPUT_PULLUP);
# endif

    // camera init
    esp_err_t err = esp_camera_init(&config);
    if (err != ESP_OK) {
        Serial.printf("Camera init failed with error 0x%x", err);
        return;
    }

    sensor_t * s = esp_camera_sensor_get();
    // initial sensors are flipped vertically and colors are a bit saturated
    if (s->id.PID == OV3660_PID) {
        s->set_vflip(s, 1); // flip it back
        s->set_brightness(s, 1); // up the brightness just a bit
        s->set_saturation(s, 0); // lower the saturation
    }
    // drop down frame size for higher initial frame rate
    s->set_framesize(s, FRAMESIZE_VGA);

# if defined(CAMERA_MODEL_M5STACK_WIDE)
    s->set_vflip(s, 1);
    s->set_hmirror(s, 1);
# endif

    WiFi.begin(ssid, password);

    while (WiFi.status() != WL_CONNECTED) {
        delay(500);
        Serial.print(".");
    }
    Serial.println("");
    Serial.println("WiFi connected");

    startCameraServer();

    Serial.print("Camera Ready! Use 'http://");
    Serial.print(WiFi.localIP());
    Serial.println("/ts' to connect");
}

void loop() {
    // put your main code here, to run repeatedly:
    delay(10);
}

camera_pins.hの追加

  • 以下の内容でinclude/camera_pins.hファイルを作成。
camera_pins.h
# if defined(CAMERA_MODEL_WROVER_KIT)
# define PWDN_GPIO_NUM    -1
# define RESET_GPIO_NUM   -1
# define XCLK_GPIO_NUM    21
# define SIOD_GPIO_NUM    26
# define SIOC_GPIO_NUM    27

# define Y9_GPIO_NUM      35
# define Y8_GPIO_NUM      34
# define Y7_GPIO_NUM      39
# define Y6_GPIO_NUM      36
# define Y5_GPIO_NUM      19
# define Y4_GPIO_NUM      18
# define Y3_GPIO_NUM       5
# define Y2_GPIO_NUM       4
# define VSYNC_GPIO_NUM   25
# define HREF_GPIO_NUM    23
# define PCLK_GPIO_NUM    22

# elif defined(CAMERA_MODEL_ESP_EYE)
# define PWDN_GPIO_NUM    -1
# define RESET_GPIO_NUM   -1
# define XCLK_GPIO_NUM    4
# define SIOD_GPIO_NUM    18
# define SIOC_GPIO_NUM    23

# define Y9_GPIO_NUM      36
# define Y8_GPIO_NUM      37
# define Y7_GPIO_NUM      38
# define Y6_GPIO_NUM      39
# define Y5_GPIO_NUM      35
# define Y4_GPIO_NUM      14
# define Y3_GPIO_NUM      13
# define Y2_GPIO_NUM      34
# define VSYNC_GPIO_NUM   5
# define HREF_GPIO_NUM    27
# define PCLK_GPIO_NUM    25

# elif defined(CAMERA_MODEL_M5STACK_PSRAM)
# define PWDN_GPIO_NUM     -1
# define RESET_GPIO_NUM    15
# define XCLK_GPIO_NUM     27
# define SIOD_GPIO_NUM     25
# define SIOC_GPIO_NUM     23

# define Y9_GPIO_NUM       19
# define Y8_GPIO_NUM       36
# define Y7_GPIO_NUM       18
# define Y6_GPIO_NUM       39
# define Y5_GPIO_NUM        5
# define Y4_GPIO_NUM       34
# define Y3_GPIO_NUM       35
# define Y2_GPIO_NUM       32
# define VSYNC_GPIO_NUM    22
# define HREF_GPIO_NUM     26
# define PCLK_GPIO_NUM     21

# elif defined(CAMERA_MODEL_M5STACK_WIDE)
# define PWDN_GPIO_NUM     -1
# define RESET_GPIO_NUM    15
# define XCLK_GPIO_NUM     27
# define SIOD_GPIO_NUM     22
# define SIOC_GPIO_NUM     23

# define Y9_GPIO_NUM       19
# define Y8_GPIO_NUM       36
# define Y7_GPIO_NUM       18
# define Y6_GPIO_NUM       39
# define Y5_GPIO_NUM        5
# define Y4_GPIO_NUM       34
# define Y3_GPIO_NUM       35
# define Y2_GPIO_NUM       32
# define VSYNC_GPIO_NUM    25
# define HREF_GPIO_NUM     26
# define PCLK_GPIO_NUM     21

# elif defined(CAMERA_MODEL_AI_THINKER)
# define PWDN_GPIO_NUM     32
# define RESET_GPIO_NUM    -1
# define XCLK_GPIO_NUM      0
# define SIOD_GPIO_NUM     26
# define SIOC_GPIO_NUM     27

# define Y9_GPIO_NUM       35
# define Y8_GPIO_NUM       34
# define Y7_GPIO_NUM       39
# define Y6_GPIO_NUM       36
# define Y5_GPIO_NUM       21
# define Y4_GPIO_NUM       19
# define Y3_GPIO_NUM       18
# define Y2_GPIO_NUM        5
# define VSYNC_GPIO_NUM    25
# define HREF_GPIO_NUM     23
# define PCLK_GPIO_NUM     22

# elif defined(CAMERA_MODEL_TTGO_T_JOURNAL)
# define PWDN_GPIO_NUM      0
# define RESET_GPIO_NUM    15
# define XCLK_GPIO_NUM     27
# define SIOD_GPIO_NUM     25
# define SIOC_GPIO_NUM     23

# define Y9_GPIO_NUM       19
# define Y8_GPIO_NUM       36
# define Y7_GPIO_NUM       18
# define Y6_GPIO_NUM       39
# define Y5_GPIO_NUM        5
# define Y4_GPIO_NUM       34
# define Y3_GPIO_NUM       35
# define Y2_GPIO_NUM       17
# define VSYNC_GPIO_NUM    22
# define HREF_GPIO_NUM     26
# define PCLK_GPIO_NUM     21

# else
# error "Camera model not selected"
# endif

ビルド、アップロード

  • PlatformIOからBuildして、Upload And Monitorする。
  • Uploadする時は、ESP32-CAMのIO0とGNDをショートさせておくこと。

実行

  • IO0とGNDのショートを解除し、ESP32-CAMのリセットボタンを押下するとWEBサーバが起動する。
    • シリアルモニタのCamera Ready! Use http://192.168.3.23/ts to connectの行を確認。
    • 自分の環境ではIPアドレスは192.168.3.23になった。
  • ブラウザから上記URL http://<esp32-cam-ip>/tsに接続。
    • 画面表示後は、モデルのダウンロードにしばらく時間がかかるので待機。
    • ダウンロードが終わると、Start ESP32 Webcamのボタンがアクティブになるので押下。
    • これでカメラ映像の受信が始まり、物体検出が始まる。
    • もちろん、スマホからでもアクセスできる。(モデルのデータサイズが大きいので注意)

スクリーンショット 2021-05-22 23.36.19.png

最後に

  • ESP32-CAMをカメラサーバとして使い、スマホやPCのブラウザで検出をするお手軽な構成が実現できた。
  • Javascript側で検出に応じたアクションを記述すれば、いろいろできそうな感じ。
  • HTMLのvideo.src = 'http://' + window.location.hostname + ":81/"; を書き換えれば他のカメラでも使える。
5
7
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
7

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?