はじめに
- ESP32-CAMを使って物体認識させる、っていう面白そうな記事を見つけたので、自分でもやってみた。
- 他にも「ESP32-CAM単体でTensorflow LiteでFashion Mnistを動かす」ってのもあってやってみたけど、流石に単体では制約がキツい。
- こっちはESP32-CAMを動画配信ページを提供するWEBサーバとして使い、検出は接続したスマホやPC側で行うので様々な物体が認識できる。
- 使うモデルはCOCO-SSD model。
- 人、乗り物、動物など80クラスの分類ができるらしい。
VSCode & PlatformIOでプロジェクトを作成
- ボード名は
AI Thinker ESP32-CAM
-
platformio.ini
は以下で作成
[env:esp32cam]
platform = espressif32
board = esp32cam
framework = arduino
ESP32-CAMと接続
- USB/TTLシリアルコンバーターを使った接続方法などは、以下の記事を参照。
HTML&Javascript
- 何故かそのままだとsafariでもchromeでもJavascriptが動かなかったので、何箇所か自分で書き換えた。
- やってることはこんな感じ。
- TensorflowのJSライブラリのロード
- COCO-SSDモデルのロード
- ESP32-CAMのビデオストリームへのモデル適用と結果の表示
ts.html
<!DOCTYPE html>
<html lang="en">
<head>
<title>
Multiple object detection using pre trained model in TensorFlow.js
</title>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<style>
img {
display: block;
}
.camView {
position: relative;
float: left;
width: calc(100% - 20px);
margin: 10px;
cursor: pointer;
}
.camView p {
position: absolute;
padding: 5px;
background-color: rgba(255, 111, 0, 0.85);
color: #FFF;
border: 1px dashed rgba(255, 255, 255, 0.7);
z-index: 2;
font-size: 12px;
}
.highlighter {
background: rgba(0, 255, 0, 0.25);
border: 1px dashed #fff;
z-index: 1;
position: absolute;
}
</style>
</head>
<body>
<h1>ESP32 TensorFlow.js - Object Detection</h1>
<p>
Wait for the model to load before clicking the button to enable the webcam
- at which point it will become visible to use.
</p>
<section id="camSection">
<div id="liveView" class="camView">
<button id="esp32camButton">Start ESP32 Webcam</button>
<img id="esp32cam_video" width="640" height="480" crossorigin=" " />
</div>
</section>
<!-- Import TensorFlow.js library -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js"></script>
<!-- Load the coco-ssd model to use to recognize things in images -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>
<script>
const video = document.getElementById('esp32cam_video');
esp32camButton.addEventListener('click', enableCam);
esp32camButton.disabled = true;
window.onload = function() {
// alert('load model');
cocoSsd.load().then(
(data) => {
model = data;
esp32camButton.disabled = false;
alert('MODEL LOADED.');
},
(e) => { alert(e); }
);
};
function enableCam(event) {
video.src = 'http://' + window.location.hostname + ":81/";
if (!model) {
alert('No model..');
return;
}
predictWebcam();
}
var children = [];
function predictWebcam() {
// Now let's start classifying a frame in the stream.
video.crossorigin = ' ';
model.detect(video).then(
predictions => {
// Remove any highlighting we did previous frame.
for (let i = 0; i < children.length; i++) {
liveView.removeChild(children[i]);
}
children.splice(0);
// Now lets loop through predictions and draw them to the live view if
// they have a high confidence score.
for (let n = 0; n < predictions.length; n++) {
console.log(predictions[n].class + ' ' + predictions[n].score);
if (predictions[n].score > 0.55) {
const p = document.createElement('p');
p.innerText = predictions[n].class + ' - with '
+ Math.round(parseFloat(predictions[n].score) * 100)
+ '% confidence.';
p.style = 'margin-left: ' + predictions[n].bbox[0] + 'px; margin-top: '
+ (predictions[n].bbox[1] - 10) + 'px; width: '
+ (predictions[n].bbox[2] - 10) + 'px; top: 0; left: 0;';
const highlighter = document.createElement('div');
highlighter.setAttribute('class', 'highlighter');
highlighter.style = 'left: ' + predictions[n].bbox[0] + 'px; top: '
+ predictions[n].bbox[1] + 'px; width: '
+ predictions[n].bbox[2] + 'px; height: '
+ predictions[n].bbox[3] + 'px;';
liveView.appendChild(highlighter);
liveView.appendChild(p);
children.push(highlighter);
children.push(p);
}
}
},
(e) => {
alert(e);
}
);
// Call this function again to keep predicting when the browser is ready.
window.requestAnimationFrame(predictWebcam);
}
</script>
</body>
</html>
ソースコードに貼り付けるため単一行の文字列に変換
- 上記のコードをCの文字列変数としてコードに貼り付けるため、以下のスクリプトで余分な空白、コメント行、改行を除去。
- 実行結果として吐かれた文字列を後述の
main.cpp
に貼り付け。
con.rb
# trim comment, space, EOL from html file.
# paste to index_html[] in main.cpp
fname = "ts.html"
buf = ""
with open(fname,"r") as file:
for i in file:
strbuf = i.replace(" ", "")
strbuf2 = strbuf.replace("\n", "")
if not strbuf2.startswith("//"):
buf = buf + strbuf2
print(buf)
main.cpp
のコード
-
const char index_html[] PROGMEM
には上記HTML文字列を貼り付け - 以下は自分の環境で書き換え
const char* ssid = "your_wifi_ssid";
const char* password = "your_wifi_password";
main.cpp
# include <Arduino.h>
# include <WiFi.h>
# include "esp_http_server.h"
# include "esp_timer.h"
# include "esp_camera.h"
# include "img_converters.h"
# define CAMERA_MODEL_AI_THINKER
# include "camera_pins.h"
// HTML Page
const char index_html[] PROGMEM = R"rawliteral(
<!DOCTYPE html><html lang="en"><head><title>Multiple object detection using pre trained model in TensorFlow.js</title><meta charset="utf-8" /><meta http-equiv="X-UA-Compatible" content="IE=edge" /><meta name="viewport" content="width=device-width, initial-scale=1" /><style>img {display: block;}.camView {position: relative;float: left;width: calc(100% - 20px);margin: 10px;cursor: pointer;}.camView p {position: absolute;padding: 5px;background-color: rgba(255, 111, 0, 0.85);color: #FFF;border: 1px dashed rgba(255, 255, 255, 0.7);z-index: 2;font-size: 12px;}.highlighter {background: rgba(0, 255, 0, 0.25);border: 1px dashed #fff;z-index: 1;position: absolute;}</style></head><body><h1>ESP32 TensorFlow.js - Object Detection</h1><p>Wait for the model to load before clicking the button to enable the webcam- at which point it will become visible to use.</p><section id="camSection"><div id="liveView" class="camView"><button id="esp32camButton">Start ESP32 Webcam</button><img id="esp32cam_video" width="640" height="480" crossorigin=" " /></div></section><!-- Import TensorFlow.js library --><script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js"></script><!-- Load the coco-ssd model to use to recognize things in images --><script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script><script>const video = document.getElementById('esp32cam_video');esp32camButton.addEventListener('click', enableCam);esp32camButton.disabled = true;window.onload = function() {cocoSsd.load().then( (data) => { model = data;esp32camButton.disabled = false;alert('MODEL LOADED.');},(e) => { alert(e); });};function enableCam(event) {video.src = 'http://' + window.location.hostname + ":81/";if (!model) {alert('No model..');return;} predictWebcam();}var children = [];function predictWebcam() {video.crossorigin = ' ';model.detect(video).then(predictions => {for (let i = 0; i < children.length; i++) {liveView.removeChild(children[i]);}children.splice(0);for (let n = 0; n < predictions.length; n++) {console.log(predictions[n].class + ' ' + predictions[n].score);if (predictions[n].score > 0.55) {const p = document.createElement('p');p.innerText = predictions[n].class+ ' - with '+ Math.round(parseFloat(predictions[n].score) * 100)+ '% confidence.';p.style = 'margin-left: ' + predictions[n].bbox[0] + 'px; margin-top: '+ (predictions[n].bbox[1] - 10) + 'px; width: '+ (predictions[n].bbox[2] - 10) + 'px; top: 0; left: 0;';const highlighter = document.createElement('div');highlighter.setAttribute('class', 'highlighter');highlighter.style = 'left: ' + predictions[n].bbox[0] + 'px; top: '+ predictions[n].bbox[1] + 'px; width: '+ predictions[n].bbox[2] + 'px; height: '+ predictions[n].bbox[3] + 'px;';liveView.appendChild(highlighter);liveView.appendChild(p);children.push(highlighter);children.push(p);}}},(e) => {alert(e);});window.requestAnimationFrame(predictWebcam);}</script></body></html>
)rawliteral";
# define PART_BOUNDARY "123456789000000000000987654321"
static const char* _STREAM_CONTENT_TYPE = "multipart/x-mixed-replace;boundary=" PART_BOUNDARY;
static const char* _STREAM_BOUNDARY = "\r\n--" PART_BOUNDARY "\r\n";
static const char* _STREAM_PART = "Content-Type: image/jpeg\r\nContent-Length: %u\r\n\r\n";
httpd_handle_t camera_httpd = NULL;
httpd_handle_t stream_httpd = NULL;
const char* ssid = "your_wifi_ssid";
const char* password = "your_wifi_password";
static esp_err_t page_handler(httpd_req_t *req) {
httpd_resp_set_type(req, "text/html");
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
httpd_resp_send(req, index_html, sizeof(index_html));
return ESP_OK;
}
static esp_err_t stream_handler(httpd_req_t *req){
camera_fb_t * fb = NULL;
esp_err_t res = ESP_OK;
size_t _jpg_buf_len = 0;
uint8_t * _jpg_buf = NULL;
char * part_buf[64];
res = httpd_resp_set_type(req, _STREAM_CONTENT_TYPE);
if(res != ESP_OK){
return res;
}
httpd_resp_set_hdr(req, "Access-Control-Allow-Origin", "*");
while(true){
fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Camera capture failed");
res = ESP_FAIL;
} else {
if(fb->format != PIXFORMAT_JPEG){
bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);
esp_camera_fb_return(fb);
fb = NULL;
if(!jpeg_converted){
Serial.println("JPEG compression failed");
res = ESP_FAIL;
}
} else {
_jpg_buf_len = fb->len;
_jpg_buf = fb->buf;
}
}
if(res == ESP_OK){
res = httpd_resp_send_chunk(req, _STREAM_BOUNDARY, strlen(_STREAM_BOUNDARY));
}
if(res == ESP_OK){
size_t hlen = snprintf((char *)part_buf, 64, _STREAM_PART, _jpg_buf_len);
res = httpd_resp_send_chunk(req, (const char *)part_buf, hlen);
}
if(res == ESP_OK){
res = httpd_resp_send_chunk(req, (const char *)_jpg_buf, _jpg_buf_len);
}
if(fb){
esp_camera_fb_return(fb);
fb = NULL;
_jpg_buf = NULL;
} else if(_jpg_buf){
free(_jpg_buf);
_jpg_buf = NULL;
}
if(res != ESP_OK){
break;
}
}
return res;
}
void startCameraServer(){
httpd_config_t config = HTTPD_DEFAULT_CONFIG();
httpd_uri_t index_uri = {
.uri = "/",
.method = HTTP_GET,
.handler = stream_handler,
.user_ctx = NULL
};
httpd_uri_t page_uri = {
.uri = "/ts",
.method = HTTP_GET,
.handler = page_handler,
.user_ctx = NULL
};
Serial.printf("Starting web server on port: '%d'\n", config.server_port);
if (httpd_start(&camera_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(camera_httpd, &page_uri);
}
// start stream using another webserver
config.server_port += 1;
config.ctrl_port += 1;
Serial.printf("Starting stream server on port: '%d'\n", config.server_port);
if (httpd_start(&stream_httpd, &config) == ESP_OK) {
httpd_register_uri_handler(stream_httpd, &index_uri);
}
}
void setup() {
Serial.begin(9600);
Serial.setDebugOutput(true);
Serial.println();
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
// if PSRAM IC present, init with UXGA resolution and higher JPEG quality
// for larger pre-allocated frame buffer.
if(psramFound()){
config.frame_size = FRAMESIZE_UXGA;
config.jpeg_quality = 10;
config.fb_count = 2;
} else {
config.frame_size = FRAMESIZE_UXGA;
config.jpeg_quality = 12;
config.fb_count = 1;
}
# if defined(CAMERA_MODEL_ESP_EYE)
pinMode(13, INPUT_PULLUP);
pinMode(14, INPUT_PULLUP);
# endif
// camera init
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed with error 0x%x", err);
return;
}
sensor_t * s = esp_camera_sensor_get();
// initial sensors are flipped vertically and colors are a bit saturated
if (s->id.PID == OV3660_PID) {
s->set_vflip(s, 1); // flip it back
s->set_brightness(s, 1); // up the brightness just a bit
s->set_saturation(s, 0); // lower the saturation
}
// drop down frame size for higher initial frame rate
s->set_framesize(s, FRAMESIZE_VGA);
# if defined(CAMERA_MODEL_M5STACK_WIDE)
s->set_vflip(s, 1);
s->set_hmirror(s, 1);
# endif
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
startCameraServer();
Serial.print("Camera Ready! Use 'http://");
Serial.print(WiFi.localIP());
Serial.println("/ts' to connect");
}
void loop() {
// put your main code here, to run repeatedly:
delay(10);
}
camera_pins.h
の追加
- 以下の内容で
include/camera_pins.h
ファイルを作成。
camera_pins.h
# if defined(CAMERA_MODEL_WROVER_KIT)
# define PWDN_GPIO_NUM -1
# define RESET_GPIO_NUM -1
# define XCLK_GPIO_NUM 21
# define SIOD_GPIO_NUM 26
# define SIOC_GPIO_NUM 27
# define Y9_GPIO_NUM 35
# define Y8_GPIO_NUM 34
# define Y7_GPIO_NUM 39
# define Y6_GPIO_NUM 36
# define Y5_GPIO_NUM 19
# define Y4_GPIO_NUM 18
# define Y3_GPIO_NUM 5
# define Y2_GPIO_NUM 4
# define VSYNC_GPIO_NUM 25
# define HREF_GPIO_NUM 23
# define PCLK_GPIO_NUM 22
# elif defined(CAMERA_MODEL_ESP_EYE)
# define PWDN_GPIO_NUM -1
# define RESET_GPIO_NUM -1
# define XCLK_GPIO_NUM 4
# define SIOD_GPIO_NUM 18
# define SIOC_GPIO_NUM 23
# define Y9_GPIO_NUM 36
# define Y8_GPIO_NUM 37
# define Y7_GPIO_NUM 38
# define Y6_GPIO_NUM 39
# define Y5_GPIO_NUM 35
# define Y4_GPIO_NUM 14
# define Y3_GPIO_NUM 13
# define Y2_GPIO_NUM 34
# define VSYNC_GPIO_NUM 5
# define HREF_GPIO_NUM 27
# define PCLK_GPIO_NUM 25
# elif defined(CAMERA_MODEL_M5STACK_PSRAM)
# define PWDN_GPIO_NUM -1
# define RESET_GPIO_NUM 15
# define XCLK_GPIO_NUM 27
# define SIOD_GPIO_NUM 25
# define SIOC_GPIO_NUM 23
# define Y9_GPIO_NUM 19
# define Y8_GPIO_NUM 36
# define Y7_GPIO_NUM 18
# define Y6_GPIO_NUM 39
# define Y5_GPIO_NUM 5
# define Y4_GPIO_NUM 34
# define Y3_GPIO_NUM 35
# define Y2_GPIO_NUM 32
# define VSYNC_GPIO_NUM 22
# define HREF_GPIO_NUM 26
# define PCLK_GPIO_NUM 21
# elif defined(CAMERA_MODEL_M5STACK_WIDE)
# define PWDN_GPIO_NUM -1
# define RESET_GPIO_NUM 15
# define XCLK_GPIO_NUM 27
# define SIOD_GPIO_NUM 22
# define SIOC_GPIO_NUM 23
# define Y9_GPIO_NUM 19
# define Y8_GPIO_NUM 36
# define Y7_GPIO_NUM 18
# define Y6_GPIO_NUM 39
# define Y5_GPIO_NUM 5
# define Y4_GPIO_NUM 34
# define Y3_GPIO_NUM 35
# define Y2_GPIO_NUM 32
# define VSYNC_GPIO_NUM 25
# define HREF_GPIO_NUM 26
# define PCLK_GPIO_NUM 21
# elif defined(CAMERA_MODEL_AI_THINKER)
# define PWDN_GPIO_NUM 32
# define RESET_GPIO_NUM -1
# define XCLK_GPIO_NUM 0
# define SIOD_GPIO_NUM 26
# define SIOC_GPIO_NUM 27
# define Y9_GPIO_NUM 35
# define Y8_GPIO_NUM 34
# define Y7_GPIO_NUM 39
# define Y6_GPIO_NUM 36
# define Y5_GPIO_NUM 21
# define Y4_GPIO_NUM 19
# define Y3_GPIO_NUM 18
# define Y2_GPIO_NUM 5
# define VSYNC_GPIO_NUM 25
# define HREF_GPIO_NUM 23
# define PCLK_GPIO_NUM 22
# elif defined(CAMERA_MODEL_TTGO_T_JOURNAL)
# define PWDN_GPIO_NUM 0
# define RESET_GPIO_NUM 15
# define XCLK_GPIO_NUM 27
# define SIOD_GPIO_NUM 25
# define SIOC_GPIO_NUM 23
# define Y9_GPIO_NUM 19
# define Y8_GPIO_NUM 36
# define Y7_GPIO_NUM 18
# define Y6_GPIO_NUM 39
# define Y5_GPIO_NUM 5
# define Y4_GPIO_NUM 34
# define Y3_GPIO_NUM 35
# define Y2_GPIO_NUM 17
# define VSYNC_GPIO_NUM 22
# define HREF_GPIO_NUM 26
# define PCLK_GPIO_NUM 21
# else
# error "Camera model not selected"
# endif
ビルド、アップロード
- PlatformIOから
Build
して、Upload And Monitor
する。 - Uploadする時は、ESP32-CAMのIO0とGNDをショートさせておくこと。
実行
- IO0とGNDのショートを解除し、ESP32-CAMのリセットボタンを押下するとWEBサーバが起動する。
- シリアルモニタの
Camera Ready! Use http://192.168.3.23/ts to connect
の行を確認。 - 自分の環境ではIPアドレスは
192.168.3.23
になった。
- シリアルモニタの
- ブラウザから上記URL
http://<esp32-cam-ip>/ts
に接続。- 画面表示後は、モデルのダウンロードにしばらく時間がかかるので待機。
- ダウンロードが終わると、
Start ESP32 Webcam
のボタンがアクティブになるので押下。 - これでカメラ映像の受信が始まり、物体検出が始まる。
- もちろん、スマホからでもアクセスできる。(モデルのデータサイズが大きいので注意)
最後に
- ESP32-CAMをカメラサーバとして使い、スマホやPCのブラウザで検出をするお手軽な構成が実現できた。
- Javascript側で検出に応じたアクションを記述すれば、いろいろできそうな感じ。
- HTMLの
video.src = 'http://' + window.location.hostname + ":81/";
を書き換えれば他のカメラでも使える。