Raspberry Pi AI Camera × Codex CLIで作る！Vibe Coding時代のAI画像解析Webアプリ

Last updated at 2025-05-13Posted at 2025-05-08

こんにちは。
ソニーセミコンダクタソリューションズの細井と申します。
（上記の絵はChatGPTにcodexcliを利用してRaspberry Pi AI CameraのWebアプリケーションを作成するイメージ画を描いてとお願いして生成されたものです。GPIOにカメラが生えてますがそれっぽい絵になっています。すごい！）

2025年4月16日にOpenAIからターミナル上で動作する軽量なコーディングエージェントであるCodex CLIがリリースされました。
今回はこのCodex CLIを用いたVibe CodingでRaspberry Pi AI CameraのWebアプリケーションを開発してみたいと思います。（ぜひネタバラシまでご覧ください。）

バイブコーディングはAIに依存したプログラミング手法で、人は解きたい問題をコーディングに特化した大規模言語モデル（LLM）へのプロンプトとして自然言語で記述し、LLM自体が実際のCodingを行うという手法です。用語自体は2025年2月にドレイ・カーパシーによって提唱されたようです。Wiki参照

この記事を読んでわかること

CodexCLIのインストール方法とCodexCLIを用いたVibe Codingの一例
Vibe CodingによるAI Cameraアプリケーションの開発例

Raspberry Pi AI Cameraを用いたWebアプリケーション開発

概要

Raspberry PiのAI Camera(IMX500)を用いて、Webブラウザにストリーミングしながらリアルタイムでバウンディングボックス内の画像処理を切り替えられるアプリケーションを作成しました。

Codex CLIのインストール

クライアントPC(macOS/Linux)にCodex CLIをインストールします。

npm install -g @openai/codex

API Keyの登録をします。
OpenAI Platformにアクセスし、SearchからAPI Keysと入力します。

画面右上のCreateAPIKeyを押し、新たにAPI Keyの登録を行います。

生成されたAPI Keyは1度しか表示されないため、必ずコピーしてください。

クライアントPCで下記のコマンドを実行し、環境変数としてOpenAPI Keyを登録してください。

export OPENAI_API_KEY=${取得したAPIKey}

この状態で

codex

と実行することで下記のようにCodex CLIが起動します。

ForQiita % codex                        
╭──────────────────────────────────────────────────────────────╮
│ ● OpenAI Codex (research preview) v0.1.2504301751            │
╰──────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────╮
│ localhost session: xxx          │
│ ↳ workdir: ~/work/sample/codexcli/ForQiita                   │
│ ↳ model: o4-mini                                             │
│ ↳ provider: openai                                           │
│ ↳ approval: suggest                                          │
╰──────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                        │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  try: explain this codebase to me | fix any build errors | are there any bugs in my code?

o4-miniのモデルを利用するためには、OpenAIのサブスクリプションが必要です。
同じくOpenAI Platformのserachからbillingと検索しBilling Overviewを開きます。

Add payment detailsを押します。

クレジットカード情報を入力し、Continueを押します。
（Pascal codeは郵便番号です。）

Initial creadit purchaseの欄にチャージする金額を入れます。
自動理チャージの設定にしたい場合はここで、Would you like to set up automatic recharge?の欄をYesにすると自動リチャージが反映されます。

Confirm Paymentを押すと課金が完了します。

詳しくは公式Docを確認ください。
課金設定から最大15分ほど待つとCodex CLIでo4-miniも使えるようになるかと思います。

これでCodexCLIを使う準備が整いました！それでは利用してみます！

実行環境

Macbook Air 13インチ 2024
チップ: Apple M3
メモリ: 16GB
OS: Sonoma14.5

Codex CLIを使った開発フロー

1. プロジェクト作成とCodex起動

mkdir RaspberryPiAICamApp
cd RaspberryPiAICamApp
codex

上記でCodex CLIのインタラクティブセッションを開始します。

2. 仕様書(AICamSample.md)の入力

事前に用意しておいたAICamSample.md(今回の要件定義ファイル)をCodex CLIに貼り付け、Enterを押して解析を開始しました。

3. コード生成

Codex CLIから以下のファイルが生成されました。

ai_camera_app.py: Pythonスクリプト

#!/usr/bin/env python3
"""
Raspberry Pi AI Camera Web Streaming Application
"""
import argparse
import sys
import io
from functools import lru_cache
import cv2
import numpy as np
from picamera2 import Picamera2
from picamera2 import MappedArray
from picamera2.devices import IMX500
from picamera2.devices.imx500 import NetworkIntrinsics, postprocess_nanodet_detection
from flask import Flask, Response, render_template_string, request

 # Global state variables
last_detections = []
current_mode = 0
last_reduction_text = ""
imx500 = None
intrinsics = None
picam2 = None
args = None

class Detection:
    def __init__(self, coords, category, conf, metadata):
        """Detection object holding category, confidence, and bounding box."""
        self.category = category
        self.conf = conf
        # Convert normalized coords to pixel coords (x, y, w, h)
        self.box = imx500.convert_inference_coords(coords, metadata, picam2)

@lru_cache()
def get_labels():
    labels = intrinsics.labels
    if intrinsics.ignore_dash_labels:
        labels = [l for l in labels if l and l != "-"]
    return labels

def parse_detections(metadata):  # -> List[Detection]
    """Parse raw network outputs into Detection objects."""
    global last_detections
    bbox_normalization = intrinsics.bbox_normalization
    bbox_order = intrinsics.bbox_order
    threshold = args.threshold
    iou = args.iou
    max_detections = args.max_detections

    np_outputs = imx500.get_outputs(metadata, add_batch=True)
    input_w, input_h = imx500.get_input_size()
    if np_outputs is None:
        return last_detections
    # Postprocess
    if intrinsics.postprocess == "nanodet":
        boxes, scores, classes = postprocess_nanodet_detection(
            outputs=np_outputs[0], conf=threshold, iou_thres=iou, max_out_dets=max_detections
        )[0]
        from picamera2.devices.imx500.postprocess import scale_boxes

        boxes = scale_boxes(boxes, 1, 1, input_h, input_w, False, False)
    else:
        boxes, scores, classes = np_outputs[0][0], np_outputs[1][0], np_outputs[2][0]
        if bbox_normalization:
            boxes = boxes / input_h
        if bbox_order == "xy":
            boxes = boxes[:, [1, 0, 3, 2]]
        boxes = np.array_split(boxes, 4, axis=1)
        boxes = zip(*boxes)
    # Build Detection list
    last_detections = [
        Detection(box, cat, score, metadata)
        for box, score, cat in zip(boxes, scores, classes)
        if score > threshold
    ]
    return last_detections

# Image processing filters
def apply_gaussian(frame, detections):
    for det in detections:
        x, y, w, h = map(int, det.box)
        roi = frame[y:y+h, x:x+w]
        frame[y:y+h, x:x+w] = cv2.GaussianBlur(roi, (15, 15), 0)
    return frame

def apply_gamma(frame, detections, gamma=1.5):
    inv = 1.0 / gamma
    table = np.array([((i / 255.0) ** inv) * 255 for i in range(256)]).astype('uint8')
    for det in detections:
        x, y, w, h = map(int, det.box)
        roi = frame[y:y+h, x:x+w]
        frame[y:y+h, x:x+w] = cv2.LUT(roi, table)
    return frame

def apply_canny(frame, detections):
    """Apply Canny edge detection inside each detection bbox."""
    for det in detections:
        x, y, w, h = map(int, det.box)
        roi = frame[y:y+h, x:x+w]
        # Process only the first 3 channels (BGR)
        if roi.ndim != 3 or roi.shape[2] < 3:
            continue
        src = roi[..., :3]
        gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
        edges = cv2.Canny(gray, 100, 200)
        # Convert edges to BGR and write back to first 3 channels
        colored = cv2.cvtColor(edges, cv2.COLOR_GRAY2BGR)
        frame[y:y+h, x:x+w, :3] = colored
    return frame

def apply_mosaic(frame, detections, scale=0.1):
    for det in detections:
        x, y, w, h = map(int, det.box)
        roi = frame[y:y+h, x:x+w]
        small = cv2.resize(roi, (max(1,int(w*scale)), max(1,int(h*scale))),
                           interpolation=cv2.INTER_LINEAR)
        frame[y:y+h, x:x+w] = cv2.resize(small, (w, h), interpolation=cv2.INTER_NEAREST)
    return frame

def apply_crop(frame, detections):
    """Crop to union of all detections and report file size reduction."""
    global last_reduction_text
    if not detections:
        last_reduction_text = "0.00%"
        return frame
    xs = [int(det.box[0]) for det in detections]
    ys = [int(det.box[1]) for det in detections]
    xs2 = [int(det.box[0] + det.box[2]) for det in detections]
    ys2 = [int(det.box[1] + det.box[3]) for det in detections]
    x0, y0 = min(xs), min(ys)
    x1, y1 = max(xs2), max(ys2)
    crop = frame[y0:y1, x0:x1]
    # encode both for size comparison (JPEG)
    ret_full, buf_full = cv2.imencode('.jpg', frame)
    ret_crop, buf_crop = cv2.imencode('.jpg', crop)
    if ret_full and ret_crop:
        size_full = len(buf_full)
        size_crop = len(buf_crop)
        reduction = (1 - size_crop / size_full) * 100
        last_reduction_text = f"{reduction:.2f}%"
    return crop

# Flask application for streaming
app = Flask(__name__)

HTML_PAGE = '''
<html><head><title>AI Camera Streaming</title></head>
<body style="text-align:center;">
  <h1>AI Camera Streaming</h1>
  <img src="{{ url_for('video_feed') }}" width="640" />
  <p>Press keys: 1=Gaussian, 2=Gamma, 3=Canny, 4=Mosaic, 5=Crop, 0=None</p>
  <!-- Cropモードで計算された削減率を表示 -->
  <div id="info" style="margin:15px auto; font-size:20px; color:#d32f2f; font-weight:bold; background:rgba(255,255,255,0.8); padding:5px 15px; border-radius:5px; display:inline-block;">
    Crop Reduction: 0.00%
  </div>
  <script>
    window.addEventListener('keydown', function(e) {
      if (['0','1','2','3','4','5'].includes(e.key)) {
        fetch('/set_mode?mode=' + e.key);
      }
    });
    // Cropモード時の削減率を定期取得して表示更新
    setInterval(function() {
      fetch('/crop_info')
        .then(r => r.text())
        .then(text => {
          document.getElementById('info').innerText = 'Crop Reduction: ' + text;
        });
    }, 500);
  </script>
</body></html>
'''

@app.route('/')
def index():
    return render_template_string(HTML_PAGE)

@app.route('/set_mode')
def set_mode():
    global current_mode
    try:
        current_mode = int(request.args.get('mode', 0))
    except ValueError:
        current_mode = 0
    return ('', 204)

def generate_frames():
    global current_mode
    while True:
        metadata = picam2.capture_metadata()
        raw = picam2.capture_array()
        # Convert BGRA to BGR for correct color space and drop alpha channel
        if raw.ndim == 3 and raw.shape[2] == 4:
            frame = cv2.cvtColor(raw, cv2.COLOR_RGB2BGR)
        else:
            frame = raw.copy()
        # 元画像の解像度を保存（Cropモードで再構築時に使用）
        orig_h, orig_w = frame.shape[:2]
        detections = parse_detections(metadata)
        # apply selected filter
        if current_mode == 1:
            frame = apply_gaussian(frame, detections)
        elif current_mode == 2:
            frame = apply_gamma(frame, detections)
        elif current_mode == 3:
            frame = apply_canny(frame, detections)
        elif current_mode == 4:
            frame = apply_mosaic(frame, detections)
        elif current_mode == 5:
            # Cropモード: 検出領域を切り出し、元解像度にパディングして表示サイズを固定
            cropped = apply_crop(frame, detections)
            padded = np.zeros_like(frame)
            h_crop, w_crop = cropped.shape[:2]
            y_off = max((orig_h - h_crop) // 2, 0)
            x_off = max((orig_w - w_crop) // 2, 0)
            padded[y_off:y_off+h_crop, x_off:x_off+w_crop] = cropped
            frame = padded
        # Draw bounding boxes for modes other than crop
        if current_mode != 5:
            for det in detections:
                x, y, w, h = map(int, det.box)
                cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
        # Ensure 3-channel BGR for encoding
        if frame.ndim == 3 and frame.shape[2] == 4:
            to_encode = cv2.cvtColor(frame, cv2.COLOR_BGRA2BGR)
        else:
            to_encode = frame
        ret, buffer = cv2.imencode('.jpg', to_encode)
        if not ret:
            continue
        frame_bytes = buffer.tobytes()
        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame_bytes + b'\r\n')

@app.route('/video_feed')
def video_feed():
    return Response(generate_frames(),
                    mimetype='multipart/x-mixed-replace; boundary=frame')

@app.route('/crop_info')
def crop_info():
    """Cropモードで計算された削減率を返すAPIエンドポイント"""
    global last_reduction_text
    return (last_reduction_text if last_reduction_text else "0.00%"), 200, {'Content-Type': 'text/plain'}

def main():
    global args, imx500, intrinsics, picam2
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str,
                        default="/usr/share/imx500-models/imx500_network_ssd_mobilenetv2_fpnlite_320x320_pp.rpk")
    parser.add_argument("--threshold", type=float, default=0.55)
    parser.add_argument("--iou", type=float, default=0.65)
    parser.add_argument("--max-detections", type=int, default=10)
    parser.add_argument("--fps", type=int, help="Frames per second")
    parser.add_argument("--bbox-normalization", action=argparse.BooleanOptionalAction)
    parser.add_argument("--bbox-order", choices=["yx","xy"], default="yx")
    parser.add_argument("--ignore-dash-labels", action=argparse.BooleanOptionalAction)
    parser.add_argument("--postprocess", choices=["","nanodet"], default=None)
    parser.add_argument("-r", "--preserve-aspect-ratio", action=argparse.BooleanOptionalAction)
    parser.add_argument("--labels", type=str)
    parser.add_argument("--print-intrinsics", action="store_true")
    parser.add_argument("--host", type=str, default="0.0.0.0")
    parser.add_argument("--port", type=int, default=5000)
    args = parser.parse_args()

    # Initialize network and intrinsics
    imx500 = IMX500(args.model)
    intrinsics = imx500.network_intrinsics
    if not intrinsics:
        intrinsics = NetworkIntrinsics()
        intrinsics.task = "object detection"
    elif intrinsics.task != "object detection":
        print("Network is not an object detection task", file=sys.stderr)
        sys.exit(1)
    # Override intrinsics from args
    for key, value in vars(args).items():
        if key == 'labels' and value:
            with open(value, 'r') as f:
                intrinsics.labels = f.read().splitlines()
        elif hasattr(intrinsics, key) and value is not None:
            setattr(intrinsics, key, value)
    if intrinsics.labels is None:
        with open("assets/coco_labels.txt", "r") as f:
            intrinsics.labels = f.read().splitlines()
    intrinsics.update_with_defaults()
    if args.print_intrinsics:
        print(intrinsics)
        sys.exit(0)

    # Start camera
    picam2 = Picamera2(imx500.camera_num)
    config = picam2.create_preview_configuration(
        controls={"FrameRate": intrinsics.inference_rate}, buffer_count=12)
    picam2.start(config)
    if intrinsics.preserve_aspect_ratio:
        imx500.set_auto_aspect_ratio()

    # Run web server
    app.run(host=args.host, port=args.port, threaded=True)

if __name__ == '__main__':
    main()

実装したアプリケーションの説明

使用ライブラリ
- picamera2(IMX500モデル制御・推論)
- OpenCV(python)
- Flask(Webストリーミングサーバー)
機能概要
- Piカメラ映像をリアルタイムで取得し、ディープラーニング推論で物体検出
- 検出したBBOX内に対して、以下の5種類の画像処理を適用
  - キー1: ガウシアンフィルタ
  - キー2: ガンマ補正
  - キー3: Cannyエッジ検出
  - キー4: モザイク処理
  - キー5: BBOX領域のみクロップし、元フレームと比較したファイルサイズ削減率を表示
- キー0で処理なしに戻す
- WebブラウザでMJPEG形式のストリーミング表示
- キーボード押下(0～5)でリアルタイムにモード切替
- 正しいRGB色空間での表示に対応
使い方
1. Raspberry PiとAI Cameraのセットアップ (今回は省略します。公式ドキュメントをご確認ください)
2. Raspberry Pi上で依存パッケージをインストール
```
sudo apt update
sudo apt install python3-opencv python3-flask python3-picamera2
pip3 install --user flask
```
3. スクリプトを実行
```
python3 ai_camera_app.py
```
4. クライアントPCのブラウザで以下を開く
```
http://<RaspberryPiのIPアドレス>:5000
```
5. キーボードの0～5キーで処理を選択

以上がCodex CLIを活用した開発と、本アプリケーションの概要です。

ネタバラシ

実は概要から実装したアプリケーションの説明まではCodex CLIが書いたものになります。

（下記の項目については微妙に修正を行いました。）

CodexCLIのインストール手順

ai_camera_appの不具合部分

frame = cv2.cvtColor(raw, cv2.COLOR_BGRA2BGR)

を下記に修正

frame = cv2.cvtColor(raw, cv2.COLOR_RGB2BGR)

実際に行ったVibe Coding

実際に行った手順を示します。
まず要件書をMarkdown形式で作成しました。

AICamSample.md

# Raspberry PiのAI Cameraを用いたアプリケーション開発

説明は日本語で記載してください。

## 概要

[こちらのDocumentation](https://www.raspberrypi.com/documentation/accessories/ai-camera.html)に説明のあるRaspberry PiのAI Cameraを用いてアプリケーションを作成します。

## 要件

1. 下記のPythonのサンプルコードを元に実装を行います。

```python
import argparse
import sys
from functools import lru_cache

import cv2
import numpy as np

from picamera2 import MappedArray, Picamera2
from picamera2.devices import IMX500
from picamera2.devices.imx500 import (NetworkIntrinsics,
                                      postprocess_nanodet_detection)

last_detections = []


class Detection:
    def __init__(self, coords, category, conf, metadata):
        """Create a Detection object, recording the bounding box, category and confidence."""
        self.category = category
        self.conf = conf
        self.box = imx500.convert_inference_coords(coords, metadata, picam2)


def parse_detections(metadata: dict):
    """Parse the output tensor into a number of detected objects, scaled to the ISP output."""
    global last_detections
    bbox_normalization = intrinsics.bbox_normalization
    bbox_order = intrinsics.bbox_order
    threshold = args.threshold
    iou = args.iou
    max_detections = args.max_detections

    np_outputs = imx500.get_outputs(metadata, add_batch=True)
    input_w, input_h = imx500.get_input_size()
    if np_outputs is None:
        return last_detections
    if intrinsics.postprocess == "nanodet":
        boxes, scores, classes = \
            postprocess_nanodet_detection(outputs=np_outputs[0], conf=threshold, iou_thres=iou,
                                          max_out_dets=max_detections)[0]
        from picamera2.devices.imx500.postprocess import scale_boxes
        boxes = scale_boxes(boxes, 1, 1, input_h, input_w, False, False)
    else:
        boxes, scores, classes = np_outputs[0][0], np_outputs[1][0], np_outputs[2][0]
        if bbox_normalization:
            boxes = boxes / input_h

        if bbox_order == "xy":
            boxes = boxes[:, [1, 0, 3, 2]]
        boxes = np.array_split(boxes, 4, axis=1)
        boxes = zip(*boxes)

    last_detections = [
        Detection(box, category, score, metadata)
        for box, score, category in zip(boxes, scores, classes)
        if score > threshold
    ]
    return last_detections


@lru_cache
def get_labels():
    labels = intrinsics.labels

    if intrinsics.ignore_dash_labels:
        labels = [label for label in labels if label and label != "-"]
    return labels


def draw_detections(request, stream="main"):
    """Draw the detections for this request onto the ISP output."""
    detections = last_results
    if detections is None:
        return
    labels = get_labels()
    with MappedArray(request, stream) as m:
        for detection in detections:
            x, y, w, h = detection.box
            label = f"{labels[int(detection.category)]} ({detection.conf:.2f})"

            # Calculate text size and position
            (text_width, text_height), baseline = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
            text_x = x + 5
            text_y = y + 15

            # Create a copy of the array to draw the background with opacity
            overlay = m.array.copy()

            # Draw the background rectangle on the overlay
            cv2.rectangle(overlay,
                          (text_x, text_y - text_height),
                          (text_x + text_width, text_y + baseline),
                          (255, 255, 255),  # Background color (white)
                          cv2.FILLED)

            alpha = 0.30
            cv2.addWeighted(overlay, alpha, m.array, 1 - alpha, 0, m.array)

            # Draw text on top of the background
            cv2.putText(m.array, label, (text_x, text_y),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

            # Draw detection box
            cv2.rectangle(m.array, (x, y), (x + w, y + h), (0, 255, 0, 0), thickness=2)

        if intrinsics.preserve_aspect_ratio:
            b_x, b_y, b_w, b_h = imx500.get_roi_scaled(request)
            color = (255, 0, 0)  # red
            cv2.putText(m.array, "ROI", (b_x + 5, b_y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
            cv2.rectangle(m.array, (b_x, b_y), (b_x + b_w, b_y + b_h), (255, 0, 0, 0))


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, help="Path of the model",
                        default="/usr/share/imx500-models/imx500_network_ssd_mobilenetv2_fpnlite_320x320_pp.rpk")
    parser.add_argument("--fps", type=int, help="Frames per second")
    parser.add_argument("--bbox-normalization", action=argparse.BooleanOptionalAction, help="Normalize bbox")
    parser.add_argument("--bbox-order", choices=["yx", "xy"], default="yx",
                        help="Set bbox order yx -> (y0, x0, y1, x1) xy -> (x0, y0, x1, y1)")
    parser.add_argument("--threshold", type=float, default=0.55, help="Detection threshold")
    parser.add_argument("--iou", type=float, default=0.65, help="Set iou threshold")
    parser.add_argument("--max-detections", type=int, default=10, help="Set max detections")
    parser.add_argument("--ignore-dash-labels", action=argparse.BooleanOptionalAction, help="Remove '-' labels ")
    parser.add_argument("--postprocess", choices=["", "nanodet"],
                        default=None, help="Run post process of type")
    parser.add_argument("-r", "--preserve-aspect-ratio", action=argparse.BooleanOptionalAction,
                        help="preserve the pixel aspect ratio of the input tensor")
    parser.add_argument("--labels", type=str,
                        help="Path to the labels file")
    parser.add_argument("--print-intrinsics", action="store_true",
                        help="Print JSON network_intrinsics then exit")
    return parser.parse_args()


if __name__ == "__main__":
    args = get_args()

    # This must be called before instantiation of Picamera2
    imx500 = IMX500(args.model)
    intrinsics = imx500.network_intrinsics
    if not intrinsics:
        intrinsics = NetworkIntrinsics()
        intrinsics.task = "object detection"
    elif intrinsics.task != "object detection":
        print("Network is not an object detection task", file=sys.stderr)
        exit()

    # Override intrinsics from args
    for key, value in vars(args).items():
        if key == 'labels' and value is not None:
            with open(value, 'r') as f:
                intrinsics.labels = f.read().splitlines()
        elif hasattr(intrinsics, key) and value is not None:
            setattr(intrinsics, key, value)

    # Defaults
    if intrinsics.labels is None:
        with open("assets/coco_labels.txt", "r") as f:
            intrinsics.labels = f.read().splitlines()
    intrinsics.update_with_defaults()

    if args.print_intrinsics:
        print(intrinsics)
        exit()

    picam2 = Picamera2(imx500.camera_num)
    config = picam2.create_preview_configuration(controls={"FrameRate": intrinsics.inference_rate}, buffer_count=12)

    imx500.show_network_fw_progress_bar()
    picam2.start(config, show_preview=True)

    if intrinsics.preserve_aspect_ratio:
        imx500.set_auto_aspect_ratio()

    last_results = None
    picam2.pre_callback = draw_detections
    while True:
        last_results = parse_detections(picam2.capture_metadata())
    ```

2. ObjectDetectionのタスクにより取得したBBOXに対して特定の画像処理をかけることができるアプリケーションです。

3. BBOX内の領域に下記の画像処理を適用します。またKey操作により画像処理は切り替えることができます。画像処理はOpenCV pythonのライブラリを使用します。
  * Key 1: ガウシアンフィルタ
  * Key 2: ガンマ補正
  * Key 3: Canny Edge Detection
  * Key 4: モザイク処理

4. 作成後、pythonファイルとReadme.mdを生成します。

その後下記のコマンドを実行しました。

codex "$(cat AICamSample.md)"

その後、生成されたai_camera_app.pyを実際にRaspberry Pi AI camera上で実行し不具合が出た場合は、その内容を入力し、CodexCLIに修正してもらいました。

唯一、RGB変換ができなかったのみでそれ以外は全てコーディングすることなくアプリケーションを0から作成することができ、上記アプリケーション作成にかかった金額は$1.11ほどでした。

Raspberry Pi Cameraのセットアップはおおよそ1時間ほど、アプリケーション開発も1時間でしたので、計2時間ほどでEdgeで動作するWebアプリケーションを構築することができました！
今回はCodexCLIから出力されるコードの動作確認のためにAI Cameraを利用しましたが、今後はより要件書を厳密に作成し、エッジで動作するソリューションアプリケーションを開発してみたいと思います。

困った時は

もし、記事の途中でうまくいかなかった場合は、お気軽にこの記事へコメントいただくか、下記のサポートページをご活用ください。
コメントのお返事にはお時間を頂く可能性もありますがご了承ください。

Raspberry Pi Forums

さいごに

今回はCodexCLIを利用してノーコードでRaspberryPi AI CameraのWebアプリケーションを作ってみました。
最初はざっくりとした要件で作成をしてみたのですが、うまくいかず今回添付した粒度の要件書であれば、ほぼコードを追加せずとも思い通りにアプリケーションを作成することができました！
セットアップも含めてAIアプリケーションが2時間弱で動作可能なレベルとして生成できるのはすごい時代になったなと思います。
ぜひ、皆様もRaspberry Pi AI CameraとVibe Codingでお手軽アプリケーション開発を試してみてください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up