【実装編】長時間稼働AIエージェントをPythonで作る - Anthropic公式サンプルコード徹底解説

Posted at 2026-01-06

3分で分かるサマリー

この記事で解説するもの: Anthropic公式の長時間稼働エージェント実装サンプル（autonomous-coding）

アーキテクチャ:

autonomous-coding/
├── autonomous_agent_demo.py  # エントリーポイント
├── agent.py                  # 二部構成のコアロジック
├── client.py                 # Claude SDK + セキュリティ設定
├── security.py               # Bashコマンドのホワイトリスト
└── prompts/                  # イニシャライザー/コーディング用プロンプト

二部構成アプローチ:

エージェント	タイミング	役割
イニシャライザー	初回のみ	feature_list.json（200テスト）、init.sh作成、Git初期化
コーディング	2回目以降	1機能実装 → テスト → 記録 → コミット

セキュリティ（三層防御）:

OSレベルサンドボックス
ファイルパーミッション（プロジェクト内のみ）
Bashコマンドホワイトリスト + 追加検証

実行方法:

export ANTHROPIC_API_KEY='your-key'
python autonomous_agent_demo.py --project-dir ./my_project

→ 各コードの詳細解説は本文へ

はじめに

前回の記事では、長時間稼働エージェントが破綻する原因と、Anthropicが提唱する「ハーネス」という解決策を解説しました。

今回は、Anthropicが公開している実装サンプルを使って、実際にどうコードを書けばよいのかを解説します。

このサンプルは claude-quickstarts/autonomous-coding で公開されています。

全体アーキテクチャ

まず、システム全体の構成を見てみましょう。

ファイル構成

autonomous-coding/
├── autonomous_agent_demo.py  # エントリーポイント（117行）
├── agent.py                  # エージェントセッション管理（207行）
├── client.py                 # Claude SDKクライアント設定（123行）
├── security.py               # Bashコマンド検証（359行）
├── progress.py               # 進捗追跡（58行）
├── prompts.py                # プロンプト読み込み（38行）
└── prompts/
    ├── initializer_prompt.md # イニシャライザー用プロンプト
    ├── coding_prompt.md      # コーディングエージェント用プロンプト
    └── app_spec.txt          # アプリケーション仕様書（21KB）

各ファイルの役割を簡単に説明します：

ファイル	役割	重要度
autonomous_agent_demo.py	CLIエントリーポイント。引数解析と起動処理	★★☆
agent.py	セッション管理のコアロジック。二部構成の実装	★★★
client.py	Claude SDK設定。セキュリティとMCP統合	★★★
security.py	Bashコマンドのホワイトリスト検証	★★★
progress.py	進捗表示のユーティリティ	★☆☆
prompts.py	プロンプトファイルの読み込み	★☆☆

1. エントリーポイント：autonomous_agent_demo.py

全体像

エントリーポイントは、ユーザーからの入力を受け取り、エージェントを起動する役割を担います。

#!/usr/bin/env python3
"""
Autonomous Coding Agent Demo
============================

A minimal harness demonstrating long-running autonomous coding with Claude.
This script implements the two-agent pattern (initializer + coding agent) and
incorporates all the strategies from the long-running agents guide.
"""

import argparse
import asyncio
import os
from pathlib import Path

from agent import run_autonomous_agent

# デフォルトモデルの設定
# claude-sonnet-4-5 は高速かつ高品質なバランス型モデル
DEFAULT_MODEL = "claude-sonnet-4-5-20250929"

解説：

#!/usr/bin/env python3 - Unix系OSで直接実行可能にするシェバン
DEFAULT_MODEL - 使用するClaudeモデル。Sonnetはコスト効率と性能のバランスが良い
from agent import run_autonomous_agent - コアロジックを別モジュールからインポート

コマンドライン引数の解析

def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    parser = argparse.ArgumentParser(
        description="Autonomous Coding Agent Demo - Long-running agent harness",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Start fresh project
  python autonomous_agent_demo.py --project-dir ./claude_clone

  # Use a specific model
  python autonomous_agent_demo.py --project-dir ./claude_clone --model claude-sonnet-4-5-20250929

  # Limit iterations for testing
  python autonomous_agent_demo.py --project-dir ./claude_clone --max-iterations 5

  # Continue existing project
  python autonomous_agent_demo.py --project-dir ./claude_clone

Environment Variables:
  ANTHROPIC_API_KEY    Your Anthropic API key (required)
        """,
    )

    # プロジェクトディレクトリの指定
    parser.add_argument(
        "--project-dir",
        type=Path,
        default=Path("./autonomous_demo_project"),
        help="Directory for the project (default: ./autonomous_demo_project)",
    )

    # 反復回数の上限（デバッグ・テスト用）
    parser.add_argument(
        "--max-iterations",
        type=int,
        default=None,  # Noneは無制限を意味する
        help="Maximum number of agent iterations (default: unlimited)",
    )

    # 使用するClaudeモデル
    parser.add_argument(
        "--model",
        type=str,
        default=DEFAULT_MODEL,
        help=f"Claude model to use (default: {DEFAULT_MODEL})",
    )

    return parser.parse_args()

解説：

引数	型	デフォルト	用途
`--project-dir`	Path	./autonomous_demo_project	生成されるプロジェクトの保存先
`--max-iterations`	int	None (無制限)	セッション反復の上限。テスト時に便利
`--model`	str	claude-sonnet-4-5	使用するClaudeモデル

なぜmax-iterationsがあるのか：

本番では無制限で動かし、全機能が完成するまで実行
テスト・デバッグ時は3〜5回程度に制限して動作確認
APIコスト管理のため、最初は制限付きで試すのが推奨

メイン関数

def main() -> None:
    """Main entry point."""
    args = parse_args()

    # APIキーの存在確認
    # 環境変数が設定されていない場合は、親切なエラーメッセージを表示
    if not os.environ.get("ANTHROPIC_API_KEY"):
        print("Error: ANTHROPIC_API_KEY environment variable not set")
        print("\nGet your API key from: https://console.anthropic.com/")
        print("\nThen set it:")
        print("  export ANTHROPIC_API_KEY='your-api-key-here'")
        return

    # プロジェクトディレクトリのパス処理
    # 相対パスの場合、自動的に generations/ ディレクトリ配下に配置
    project_dir = args.project_dir
    if not str(project_dir).startswith("generations/"):
        if project_dir.is_absolute():
            pass  # 絶対パスはそのまま使用
        else:
            project_dir = Path("generations") / project_dir

    # エージェントの実行
    try:
        # asyncio.run() で非同期関数を同期的に呼び出す
        asyncio.run(
            run_autonomous_agent(
                project_dir=project_dir,
                model=args.model,
                max_iterations=args.max_iterations,
            )
        )
    except KeyboardInterrupt:
        # Ctrl+C で中断された場合
        # 重要: 「再実行すれば続きから」と案内
        print("\n\nInterrupted by user")
        print("To resume, run the same command again")
    except Exception as e:
        print(f"\nFatal error: {e}")
        raise


if __name__ == "__main__":
    main()

解説：

APIキーチェック：

if not os.environ.get("ANTHROPIC_API_KEY"):

Claude APIを使用するには、APIキーが必須
環境変数から読み取る設計により、ソースコードにキーをハードコードすることを防止
未設定時は取得方法を案内する親切なエラーメッセージ

非同期実行：

asyncio.run(run_autonomous_agent(...))

asyncio.run() は Python 3.7+ で追加された非同期実行のエントリーポイント
Claude SDKは非同期APIを提供しているため、async/awaitパターンを使用
非同期により、ネットワークI/O待ち時間を効率的に処理

KeyboardInterrupt対応：

except KeyboardInterrupt:
    print("To resume, run the same command again")

これが長時間稼働エージェントの核心
中断しても、同じコマンドを再実行すれば続きから再開できる
なぜ可能か？ → 状態がすべてファイル（feature_list.json, git等）に保存されているから

2. コアロジック：agent.py

ここが最も重要なファイルです。二部構成アプローチの実装を詳しく見ていきましょう。

定数とインポート

"""
Agent Session Logic
===================

Core agent interaction functions for running autonomous coding sessions.
"""

import asyncio
from pathlib import Path
from typing import Optional

from claude_code_sdk import ClaudeSDKClient

from client import create_client
from progress import print_session_header, print_progress_summary
from prompts import get_initializer_prompt, get_coding_prompt, copy_spec_to_project

# セッション間の待機時間
# 短すぎるとAPIレート制限に引っかかる可能性
# 長すぎると全体の実行時間が伸びる
AUTO_CONTINUE_DELAY_SECONDS = 3

解説：

AUTO_CONTINUE_DELAY_SECONDS = 3

セッションが完了してから次のセッションを開始するまでの待機時間
なぜ必要か：
1. APIレート制限の回避
2. 前のセッションで書き込んだファイルがディスクに確実に反映されるまでの猶予
3. エラー発生時の連続リトライを防止

セッション実行：run_agent_session()

async def run_agent_session(
    client: ClaudeSDKClient,
    message: str,
    project_dir: Path,
) -> tuple[str, str]:
    """
    Run a single agent session using Claude Agent SDK.

    Args:
        client: Claude SDK client
        message: The prompt to send
        project_dir: Project directory path

    Returns:
        (status, response_text) where status is:
        - "continue" if agent should continue working
        - "error" if an error occurred
    """
    print("Sending prompt to Claude Agent SDK...\n")

    try:
        # プロンプトをClaudeに送信
        await client.query(message)

        # レスポンスを収集・表示
        response_text = ""

        # 非同期ジェネレータでストリーミングレスポンスを処理
        async for msg in client.receive_response():
            msg_type = type(msg).__name__

            # AssistantMessage: Claudeからのテキストまたはツール使用
            if msg_type == "AssistantMessage" and hasattr(msg, "content"):
                for block in msg.content:
                    block_type = type(block).__name__

                    # テキストブロック: Claudeの発言
                    if block_type == "TextBlock" and hasattr(block, "text"):
                        response_text += block.text
                        # リアルタイムで出力（flush=Trueで即時表示）
                        print(block.text, end="", flush=True)

                    # ツール使用ブロック: Bash, Read, Write等の実行
                    elif block_type == "ToolUseBlock" and hasattr(block, "name"):
                        print(f"\n[Tool: {block.name}]", flush=True)
                        # ツールの入力パラメータを表示（長い場合は省略）
                        if hasattr(block, "input"):
                            input_str = str(block.input)
                            if len(input_str) > 200:
                                print(f"   Input: {input_str[:200]}...", flush=True)
                            else:
                                print(f"   Input: {input_str}", flush=True)

            # UserMessage: ツール実行結果
            elif msg_type == "UserMessage" and hasattr(msg, "content"):
                for block in msg.content:
                    block_type = type(block).__name__

                    if block_type == "ToolResultBlock":
                        result_content = getattr(block, "content", "")
                        is_error = getattr(block, "is_error", False)

                        # セキュリティフックによるブロック
                        if "blocked" in str(result_content).lower():
                            print(f"   [BLOCKED] {result_content}", flush=True)
                        elif is_error:
                            # エラー発生（500文字で切り詰め）
                            error_str = str(result_content)[:500]
                            print(f"   [Error] {error_str}", flush=True)
                        else:
                            # 成功時は簡潔に表示
                            print("   [Done]", flush=True)

        print("\n" + "-" * 70 + "\n")
        return "continue", response_text

    except Exception as e:
        print(f"Error during agent session: {e}")
        return "error", str(e)

解説：

ストリーミングレスポンスの処理：

async for msg in client.receive_response():

Claude SDKは非同期ジェネレータでレスポンスをストリーミング
これにより、長時間の処理でもリアルタイムで進捗を確認可能
ユーザーは「何をしているのか」を常に把握できる

メッセージタイプの分岐：

なぜ2種類のメッセージがあるのか：

AssistantMessage: Claudeが「言った」または「実行しようとした」内容
UserMessage: ツール実行の「結果」（実際にはシステムが返す）

出力の例：

Sending prompt to Claude Agent SDK...

I'll start by checking the current directory structure.

[Tool: Bash]
   Input: {'command': 'ls -la'}
   [Done]

[Tool: Read]
   Input: {'file_path': './feature_list.json'}
   [Done]

Based on the feature list, I'll implement the login form...

メインループ：run_autonomous_agent()

async def run_autonomous_agent(
    project_dir: Path,
    model: str,
    max_iterations: Optional[int] = None,
) -> None:
    """
    Run the autonomous agent loop.

    Args:
        project_dir: Directory for the project
        model: Claude model to use
        max_iterations: Maximum number of iterations (None for unlimited)
    """
    # 起動バナーの表示
    print("\n" + "=" * 70)
    print("  AUTONOMOUS CODING AGENT DEMO")
    print("=" * 70)
    print(f"\nProject directory: {project_dir}")
    print(f"Model: {model}")
    if max_iterations:
        print(f"Max iterations: {max_iterations}")
    else:
        print("Max iterations: Unlimited (will run until completion)")
    print()

    # プロジェクトディレクトリを作成
    # parents=True: 親ディレクトリも作成
    # exist_ok=True: 既存でもエラーにしない
    project_dir.mkdir(parents=True, exist_ok=True)

    # ★★★ 核心部分 ★★★
    # feature_list.json の有無で、初回実行か継続かを判定
    tests_file = project_dir / "feature_list.json"
    is_first_run = not tests_file.exists()

    if is_first_run:
        # 初回実行: イニシャライザーエージェントを使用
        print("Fresh start - will use initializer agent")
        print()
        print("=" * 70)
        print("  NOTE: First session takes 10-20+ minutes!")
        print("  The agent is generating 200 detailed test cases.")
        print("  This may appear to hang - it's working. Watch for [Tool: ...] output.")
        print("=" * 70)
        print()
        # 仕様書をプロジェクトディレクトリにコピー
        copy_spec_to_project(project_dir)
    else:
        # 継続実行: 既存の進捗を表示
        print("Continuing existing project")
        print_progress_summary(project_dir)

解説：

初回/継続の判定ロジック：

is_first_run = not tests_file.exists()

これが長時間稼働エージェントの核心です。

なぜfeature_list.jsonで判定するのか：

このファイルは「信頼の源（Source of Truth）」
200個のテストケースが定義されている
存在すれば、プロジェクトは既に初期化済み
シンプルで確実な判定方法

初回実行時の警告メッセージ：

NOTE: First session takes 10-20+ minutes!
The agent is generating 200 detailed test cases.

初回は特に時間がかかる
ユーザーが「フリーズした」と勘違いしないよう、事前に説明

メインループの継続

    # メインループ
    iteration = 0

    while True:
        iteration += 1

        # 最大反復回数のチェック
        if max_iterations and iteration > max_iterations:
            print(f"\nReached max iterations ({max_iterations})")
            print("To continue, run the script again without --max-iterations")
            break

        # セッションヘッダーを表示
        print_session_header(iteration, is_first_run)

        # ★★★ 重要 ★★★
        # 毎回新しいクライアントを作成
        # これにより、前のセッションのコンテキストを引き継がない
        client = create_client(project_dir, model)

        # セッションタイプに応じてプロンプトを選択
        if is_first_run:
            prompt = get_initializer_prompt()
            is_first_run = False  # 初回フラグを解除（次回からはコーディング）
        else:
            prompt = get_coding_prompt()

        # セッションを実行
        # async with で自動的にリソースを解放
        async with client:
            status, response = await run_agent_session(client, prompt, project_dir)

        # ステータスに応じた処理
        if status == "continue":
            # 正常完了: 次のセッションへ
            print(f"\nAgent will auto-continue in {AUTO_CONTINUE_DELAY_SECONDS}s...")
            print_progress_summary(project_dir)
            await asyncio.sleep(AUTO_CONTINUE_DELAY_SECONDS)

        elif status == "error":
            # エラー発生: リトライ
            print("\nSession encountered an error")
            print("Will retry with a fresh session...")
            await asyncio.sleep(AUTO_CONTINUE_DELAY_SECONDS)

        # セッション間の小休止
        if max_iterations is None or iteration < max_iterations:
            print("\nPreparing next session...\n")
            await asyncio.sleep(1)

解説：

毎回新しいクライアントを作成する理由：

client = create_client(project_dir, model)

これは意図的な設計です。

❌ 誤った設計: 1つのクライアントを使い回す
   → コンテキストが蓄積 → メモリ爆発 → 性能低下

✅ 正しい設計: 毎回新しいクライアント
   → フレッシュなコンテキスト
   → 安定した動作
   → 無限に継続可能

プロンプトの切り替え：

if is_first_run:
    prompt = get_initializer_prompt()
    is_first_run = False
else:
    prompt = get_coding_prompt()

セッション	プロンプト	役割
1回目	initializer_prompt	環境構築、テストケース作成
2回目以降	coding_prompt	機能実装、テスト

自動継続の流れ：

終了処理

    # 最終サマリーの表示
    print("\n" + "=" * 70)
    print("  SESSION COMPLETE")
    print("=" * 70)
    print(f"\nProject directory: {project_dir}")
    print_progress_summary(project_dir)

    # 生成されたアプリケーションの実行方法を案内
    print("\n" + "-" * 70)
    print("  TO RUN THE GENERATED APPLICATION:")
    print("-" * 70)
    print(f"\n  cd {project_dir.resolve()}")
    print("  ./init.sh           # Run the setup script")
    print("  # Or manually:")
    print("  npm install && npm run dev")
    print("\n  Then open http://localhost:3000 (or check init.sh for the URL)")
    print("-" * 70)

    print("\nDone!")

解説：

終了時に次のアクションを明確に案内することで、ユーザー体験を向上させています。

3. セキュリティ：security.py

長時間稼働エージェントは自律的にBashコマンドを実行します。これは強力ですが、適切な制御がなければ危険です。

設計思想：ホワイトリスト方式

"""
Security Hooks for Autonomous Coding Agent
==========================================

Pre-tool-use hooks that validate bash commands for security.
Uses an allowlist approach - only explicitly permitted commands can run.
"""

import os
import shlex

# 許可されたコマンドのホワイトリスト
# 最小限必要なコマンドのみを許可
ALLOWED_COMMANDS = {
    # ファイル検査（読み取り専用）
    "ls",      # ディレクトリ一覧
    "cat",     # ファイル内容表示
    "head",    # ファイル先頭表示
    "tail",    # ファイル末尾表示
    "wc",      # 行数/単語数カウント
    "grep",    # テキスト検索

    # ファイル操作（制限付き）
    "cp",      # ファイルコピー
    "mkdir",   # ディレクトリ作成
    "chmod",   # 権限変更（追加検証あり）

    # ディレクトリ操作
    "pwd",     # カレントディレクトリ表示

    # Node.js開発
    "npm",     # パッケージマネージャ
    "node",    # Node.jsランタイム

    # バージョン管理
    "git",     # Git操作

    # プロセス管理
    "ps",      # プロセス一覧
    "lsof",    # ファイル使用状況
    "sleep",   # 待機
    "pkill",   # プロセス終了（追加検証あり）

    # スクリプト実行
    "init.sh", # 初期化スクリプト（追加検証あり）
}

# 追加の検証が必要なコマンド
# これらは許可リストに含まれていても、さらに詳細なチェックが必要
COMMANDS_NEEDING_EXTRA_VALIDATION = {"pkill", "chmod", "init.sh"}

解説：

ホワイトリスト vs ブラックリスト：

アプローチ	説明	セキュリティ
ブラックリスト	危険なコマンドを禁止	❌ 抜け漏れリスク
ホワイトリスト	許可されたものだけ実行	✅ 安全

この実装はホワイトリスト方式を採用しています。

明示的に許可されたコマンドのみ実行可能
新しいコマンドはデフォルトでブロック
「知らないものは拒否」の原則

なぜこれらのコマンドが必要か：

開発作業に必要な最小限のコマンド:

ファイル確認  → ls, cat, head, tail, grep
ファイル操作  → cp, mkdir, chmod
ビルド       → npm, node
バージョン管理 → git
デバッグ     → ps, lsof
サーバー管理  → pkill (開発サーバーの再起動用)

コマンド抽出ロジック

def split_command_segments(command_string: str) -> list[str]:
    """
    複合コマンドを個別のセグメントに分割する。

    コマンドチェーン（&&, ||, ;）を処理するが、
    パイプ（|）は単一コマンドとして扱う。

    Args:
        command_string: 完全なシェルコマンド

    Returns:
        個別コマンドセグメントのリスト
    """
    import re

    # && と || で分割（クォート内を除く）
    segments = re.split(r"\s*(?:&&|\|\|)\s*", command_string)

    # さらにセミコロンで分割
    result = []
    for segment in segments:
        sub_segments = re.split(r'(?<!["\'])\s*;\s*(?!["\'])', segment)
        for sub in sub_segments:
            sub = sub.strip()
            if sub:
                result.append(sub)

    return result


def extract_commands(command_string: str) -> list[str]:
    """
    シェルコマンド文字列からコマンド名を抽出する。

    パイプ、コマンドチェーン（&&, ||, ;）、サブシェルを処理。
    パス（/usr/bin/python等）からベースコマンド名を抽出。

    Args:
        command_string: 完全なシェルコマンド

    Returns:
        コマンド名のリスト
    """
    commands = []
    import re

    # セミコロンで分割（クォート外のみ）
    segments = re.split(r'(?<!["\'])\s*;\s*(?!["\'])', command_string)

    for segment in segments:
        segment = segment.strip()
        if not segment:
            continue

        try:
            # shlex.split でシェル構文を正しくパース
            tokens = shlex.split(segment)
        except ValueError:
            # 不正なコマンド（閉じられていないクォート等）
            # 安全側に倒してブロック
            return []

        if not tokens:
            continue

        # コマンドが期待されるかどうかを追跡
        expect_command = True

        for token in tokens:
            # シェル演算子は次のコマンドを示す
            if token in ("|", "||", "&&", "&"):
                expect_command = True
                continue

            # シェルキーワードはスキップ
            if token in (
                "if", "then", "else", "elif", "fi",
                "for", "while", "until", "do", "done",
                "case", "esac", "in", "!", "{", "}",
            ):
                continue

            # フラグ/オプションはスキップ
            if token.startswith("-"):
                continue

            # 変数代入（VAR=value）はスキップ
            if "=" in token and not token.startswith("="):
                continue

            if expect_command:
                # パスからベースコマンド名を抽出
                # /usr/bin/python → python
                cmd = os.path.basename(token)
                commands.append(cmd)
                expect_command = False

    return commands

解説：

なぜshlex.splitを使うのか：

# 単純なsplit()では不十分
"echo 'hello world'" → ['echo', "'hello", "world'"]  # ❌ 間違い

# shlex.splitは正しく処理
"echo 'hello world'" → ['echo', 'hello world']  # ✅ 正しい

コマンド抽出の例：

extract_commands("ls -la && cat file.txt | grep error")
# → ['ls', 'cat', 'grep']

extract_commands("npm install; npm run dev")
# → ['npm', 'npm']

extract_commands("/usr/bin/python script.py")
# → ['python']

セキュリティフックの実装

async def bash_security_hook(input_data, tool_use_id=None, context=None):
    """
    Bashコマンドをホワイトリストで検証するフック。

    ALLOWED_COMMANDS に含まれるコマンドのみ許可。

    Args:
        input_data: tool_name と tool_input を含む辞書
        tool_use_id: オプションのツール使用ID
        context: オプションのコンテキスト

    Returns:
        空の辞書（許可）または {"decision": "block", "reason": "..."} （拒否）
    """
    # Bash以外のツールは無条件で許可
    if input_data.get("tool_name") != "Bash":
        return {}

    command = input_data.get("tool_input", {}).get("command", "")
    if not command:
        return {}

    # コマンドを抽出
    commands = extract_commands(command)

    if not commands:
        # パースできない場合は安全側に倒してブロック
        return {
            "decision": "block",
            "reason": f"Could not parse command for security validation: {command}",
        }

    # セグメントに分割（追加検証用）
    segments = split_command_segments(command)

    # 各コマンドをホワイトリストと照合
    for cmd in commands:
        # ホワイトリストチェック
        if cmd not in ALLOWED_COMMANDS:
            return {
                "decision": "block",
                "reason": f"Command '{cmd}' is not in the allowed commands list",
            }

        # 追加検証が必要なコマンド
        if cmd in COMMANDS_NEEDING_EXTRA_VALIDATION:
            # 該当するセグメントを特定
            cmd_segment = get_command_for_validation(cmd, segments)
            if not cmd_segment:
                cmd_segment = command  # フォールバック

            # コマンド別の検証
            if cmd == "pkill":
                allowed, reason = validate_pkill_command(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
            elif cmd == "chmod":
                allowed, reason = validate_chmod_command(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}
            elif cmd == "init.sh":
                allowed, reason = validate_init_script(cmd_segment)
                if not allowed:
                    return {"decision": "block", "reason": reason}

    # すべてのチェックを通過 → 許可
    return {}

解説：

フックの戻り値：

return {}                                    # 許可
return {"decision": "block", "reason": "..."} # 拒否

検証フロー：

追加検証：pkillコマンド

def validate_pkill_command(command_string: str) -> tuple[bool, str]:
    """
    pkillコマンドの検証 - 開発関連プロセスのみ許可。

    shlex を使用してコマンドをパースし、
    正規表現バイパスの脆弱性を回避。

    Returns:
        (許可フラグ, 拒否理由)
    """
    # 許可されるプロセス名
    # 開発サーバーの再起動に必要なもののみ
    allowed_process_names = {
        "node",   # Node.jsプロセス
        "npm",    # npmプロセス
        "npx",    # npxプロセス
        "vite",   # Vite開発サーバー
        "next",   # Next.js開発サーバー
    }

    try:
        tokens = shlex.split(command_string)
    except ValueError:
        return False, "Could not parse pkill command"

    if not tokens:
        return False, "Empty pkill command"

    # フラグ以外の引数を抽出
    args = []
    for token in tokens[1:]:  # 最初のトークン(pkill)をスキップ
        if not token.startswith("-"):
            args.append(token)

    if not args:
        return False, "pkill requires a process name"

    # ターゲットは通常、最後の非フラグ引数
    target = args[-1]

    # -f フラグ（フルコマンドラインマッチ）の場合
    # "pkill -f 'node server.js'" → target は "node server.js"
    # プロセス名は "node"
    if " " in target:
        target = target.split()[0]

    if target in allowed_process_names:
        return True, ""

    return False, f"pkill only allowed for dev processes: {allowed_process_names}"

解説：

なぜpkillに追加検証が必要か：

pkill node      # ✅ 開発サーバーの停止 → 許可
pkill nginx     # ❌ 本番サーバーの停止 → ブロック
pkill -9 -1     # ❌ 全プロセス終了 → ブロック

追加検証：chmodコマンド

def validate_chmod_command(command_string: str) -> tuple[bool, str]:
    """
    chmodコマンドの検証 - +x（実行権限付与）のみ許可。

    Returns:
        (許可フラグ, 拒否理由)
    """
    try:
        tokens = shlex.split(command_string)
    except ValueError:
        return False, "Could not parse chmod command"

    if not tokens or tokens[0] != "chmod":
        return False, "Not a chmod command"

    mode = None
    files = []

    for token in tokens[1:]:
        if token.startswith("-"):
            # -R等のフラグは許可しない
            return False, "chmod flags are not allowed"
        elif mode is None:
            mode = token
        else:
            files.append(token)

    if mode is None:
        return False, "chmod requires a mode"

    if not files:
        return False, "chmod requires at least one file"

    # +x バリエーションのみ許可
    # +x, u+x, g+x, o+x, a+x, ug+x 等
    import re
    if not re.match(r"^[ugoa]*\+x$", mode):
        return False, f"chmod only allowed with +x mode, got: {mode}"

    return True, ""

解説：

なぜ+xのみ許可するのか：

chmod +x init.sh     # ✅ スクリプトを実行可能に → 許可
chmod 777 file.txt   # ❌ 危険な権限変更 → ブロック
chmod -R 755 /       # ❌ 再帰的な権限変更 → ブロック

三層防御の設計

多層防御の意義：

1つの層が突破されても、次の層で防御
各層は独立して機能
攻撃者は全ての層を突破する必要がある

4. クライアント設定：client.py

Claude SDKクライアントの設定を行います。ここでセキュリティ設定とMCP統合を行います。

ツール定義

"""
Claude SDK Client Configuration
===============================

Functions for creating and configuring the Claude Agent SDK client.
"""

import json
import os
from pathlib import Path

from claude_code_sdk import ClaudeCodeOptions, ClaudeSDKClient
from claude_code_sdk.types import HookMatcher

from security import bash_security_hook


# Puppeteer MCPツール - ブラウザオートメーション用
PUPPETEER_TOOLS = [
    "mcp__puppeteer__puppeteer_navigate",   # URLに移動
    "mcp__puppeteer__puppeteer_screenshot", # スクリーンショット取得
    "mcp__puppeteer__puppeteer_click",      # 要素をクリック
    "mcp__puppeteer__puppeteer_fill",       # フォームに入力
    "mcp__puppeteer__puppeteer_select",     # セレクトボックス選択
    "mcp__puppeteer__puppeteer_hover",      # 要素にホバー
    "mcp__puppeteer__puppeteer_evaluate",   # JavaScript実行（デバッグ用）
]

# 組み込みツール
BUILTIN_TOOLS = [
    "Read",   # ファイル読み取り
    "Write",  # ファイル書き込み
    "Edit",   # ファイル編集
    "Glob",   # パターンマッチでファイル検索
    "Grep",   # テキスト検索
    "Bash",   # シェルコマンド実行
]

解説：

Puppeteer MCPツールの役割：

ツール	役割	使用例
puppeteer_navigate	ページ移動	ログインページを開く
puppeteer_screenshot	画面キャプチャ	UIの確認
puppeteer_click	クリック操作	ボタンを押す
puppeteer_fill	テキスト入力	フォームに入力
puppeteer_select	選択操作	ドロップダウン選択
puppeteer_hover	ホバー操作	ツールチップ表示
puppeteer_evaluate	JS実行	DOMの確認（デバッグ用）

クライアント作成関数

def create_client(project_dir: Path, model: str) -> ClaudeSDKClient:
    """
    多層セキュリティを備えたClaude Agent SDKクライアントを作成。

    Args:
        project_dir: プロジェクトディレクトリ
        model: 使用するClaudeモデル

    Returns:
        設定済みのClaudeSDKClient

    セキュリティ層（多層防御）:
    1. サンドボックス - OSレベルでBashコマンドを分離
    2. パーミッション - ファイル操作をproject_dirに制限
    3. セキュリティフック - Bashコマンドをホワイトリストで検証
    """
    # APIキーの確認
    api_key = os.environ.get("ANTHROPIC_API_KEY")
    if not api_key:
        raise ValueError(
            "ANTHROPIC_API_KEY environment variable not set.\n"
            "Get your API key from: https://console.anthropic.com/"
        )

    # セキュリティ設定を構築
    security_settings = {
        "sandbox": {
            "enabled": True,                    # サンドボックス有効化
            "autoAllowBashIfSandboxed": True    # サンドボックス内ならBash自動許可
        },
        "permissions": {
            "defaultMode": "acceptEdits",       # 編集は自動承認
            "allow": [
                # プロジェクトディレクトリ内のファイル操作のみ許可
                # ./** は相対パスでproject_dir配下を意味
                "Read(./**)",
                "Write(./**)",
                "Edit(./**)",
                "Glob(./**)",
                "Grep(./**)",
                # Bashは許可するが、セキュリティフックで検証
                "Bash(*)",
                # Puppeteerツール（ブラウザオートメーション）
                *PUPPETEER_TOOLS,
            ],
        },
    }

    # プロジェクトディレクトリを確実に作成
    project_dir.mkdir(parents=True, exist_ok=True)

    # 設定ファイルをプロジェクトに保存
    # これにより、設定が永続化され、デバッグ時に確認可能
    settings_file = project_dir / ".claude_settings.json"
    with open(settings_file, "w") as f:
        json.dump(security_settings, f, indent=2)

    # 設定内容をログ出力
    print(f"Created security settings at {settings_file}")
    print("   - Sandbox enabled (OS-level bash isolation)")
    print(f"   - Filesystem restricted to: {project_dir.resolve()}")
    print("   - Bash commands restricted to allowlist (see security.py)")
    print("   - MCP servers: puppeteer (browser automation)")
    print()

    # クライアントを作成して返す
    return ClaudeSDKClient(
        options=ClaudeCodeOptions(
            model=model,
            system_prompt="You are an expert full-stack developer building a production-quality web application.",
            allowed_tools=[
                *BUILTIN_TOOLS,
                *PUPPETEER_TOOLS,
            ],
            # MCPサーバー設定 - Puppeteerでブラウザ操作
            mcp_servers={
                "puppeteer": {
                    "command": "npx",
                    "args": ["puppeteer-mcp-server"]
                }
            },
            # セキュリティフック - Bashコマンドを事前検証
            hooks={
                "PreToolUse": [
                    HookMatcher(
                        matcher="Bash",           # Bashツールのみ
                        hooks=[bash_security_hook] # セキュリティフックを適用
                    ),
                ],
            },
            max_turns=1000,  # 最大ターン数（十分に大きく）
            cwd=str(project_dir.resolve()),      # 作業ディレクトリ
            settings=str(settings_file.resolve()), # 設定ファイルパス
        )
    )

解説：

セキュリティ設定の詳細：

"sandbox": {
    "enabled": True,
    "autoAllowBashIfSandboxed": True
}

enabled: True - OSレベルのサンドボックスを有効化
autoAllowBashIfSandboxed: True - サンドボックス内ならBashを自動許可（セキュリティフックで追加検証）

"permissions": {
    "defaultMode": "acceptEdits",
    "allow": [...]
}

defaultMode: "acceptEdits" - ファイル編集は自動承認（人間の確認不要）
allow - 許可するツールとそのスコープ

相対パス指定の意味：

"Read(./**)"  # ./** = project_dir配下のすべてのファイル

cwd=str(project_dir.resolve()) と組み合わせて機能
プロジェクトディレクトリ外へのアクセスを防止

MCP（Model Context Protocol）の統合：

mcp_servers={
    "puppeteer": {
        "command": "npx",
        "args": ["puppeteer-mcp-server"]
    }
}

MCPはClaudeに外部ツールを提供するプロトコル
npx puppeteer-mcp-serverでPuppeteerサーバーを起動
これによりClaudeがブラウザを操作可能に

セキュリティフックの設定：

hooks={
    "PreToolUse": [
        HookMatcher(
            matcher="Bash",
            hooks=[bash_security_hook]
        ),
    ],
}

PreToolUse - ツール使用前に呼び出される
matcher="Bash" - Bashツールのみに適用
hooks=[bash_security_hook] - security.pyのフック関数を使用

5. プロンプト設計：prompts/

長時間稼働エージェントの成功は、プロンプト設計にかかっています。

プロンプト読み込み：prompts.py

"""
Prompt Loading Utilities
========================

Functions for loading prompt templates for the autonomous agent.
"""

import shutil
from pathlib import Path

# プロンプトファイルのディレクトリ
PROMPTS_DIR = Path(__file__).parent / "prompts"


def load_prompt(filename: str) -> str:
    """指定されたプロンプトファイルを読み込む"""
    prompt_path = PROMPTS_DIR / filename
    with open(prompt_path, "r") as f:
        return f.read()


def get_initializer_prompt() -> str:
    """イニシャライザーエージェント用プロンプトを取得"""
    return load_prompt("initializer_prompt.md")


def get_coding_prompt() -> str:
    """コーディングエージェント用プロンプトを取得"""
    return load_prompt("coding_prompt.md")


def copy_spec_to_project(project_dir: Path) -> None:
    """アプリケーション仕様書をプロジェクトにコピー"""
    spec_src = PROMPTS_DIR / "app_spec.txt"
    spec_dst = project_dir / "app_spec.txt"
    shutil.copy(spec_src, spec_dst)
    print(f"Copied app spec to {spec_dst}")

イニシャライザープロンプト（initializer_prompt.md）

## YOUR ROLE - INITIALIZER AGENT (Session 1 of Many)

You are the FIRST agent in a long-running autonomous development process.
Your job is to set up the foundation for all future coding agents.

### FIRST: Read the Project Specification

Start by reading `app_spec.txt` in your working directory. This file contains
the complete specification for what you need to build. Read it carefully
before proceeding.

### CRITICAL FIRST TASK: Create feature_list.json

Based on `app_spec.txt`, create a file called `feature_list.json` with 200 detailed
end-to-end test cases. This file is the single source of truth for what
needs to be built.

**Format:**
```json
[
  {
    "category": "functional",
    "description": "Brief description of the feature and what this test verifies",
    "steps": [
      "Step 1: Navigate to relevant page",
      "Step 2: Perform action",
      "Step 3: Verify expected result"
    ],
    "passes": false
  },
  {
    "category": "style",
    "description": "Brief description of UI/UX requirement",
    "steps": [
      "Step 1: Navigate to page",
      "Step 2: Take screenshot",
      "Step 3: Verify visual requirements"
    ],
    "passes": false
  }
]

Requirements for feature_list.json:

Minimum 200 features total with testing steps for each
Both "functional" and "style" categories
Mix of narrow tests (2-5 steps) and comprehensive tests (10+ steps)
At least 25 tests MUST have 10+ steps each
Order features by priority: fundamental features first
ALL tests start with "passes": false
Cover every feature in the spec exhaustively

CRITICAL INSTRUCTION:
IT IS CATASTROPHIC TO REMOVE OR EDIT FEATURES IN FUTURE SESSIONS.
Features can ONLY be marked as passing (change "passes": false to "passes": true).
Never remove features, never edit descriptions, never modify testing steps.
This ensures no functionality is missed.

SECOND TASK: Create init.sh

Create a script called init.sh that future agents can use to quickly
set up and run the development environment. The script should:

Install any required dependencies
Start any necessary servers or services
Print helpful information about how to access the running application

THIRD TASK: Initialize Git

Create a git repository and make your first commit with:

feature_list.json (complete with all 200+ features)
init.sh (environment setup script)
README.md (project overview and setup instructions)

FOURTH TASK: Create Project Structure

Set up the basic project structure based on what's specified in app_spec.txt.

OPTIONAL: Start Implementation

If you have time remaining in this session, you may begin implementing
the highest-priority features from feature_list.json.

ENDING THIS SESSION

Before your context fills up:

Commit all work with descriptive messages
Create claude-progress.txt with a summary of what you accomplished
Ensure feature_list.json is complete and saved
Leave the environment in a clean, working state

The next agent will continue from here with a fresh context window.


**解説：**

**イニシャライザーの責務：**

```mermaid
flowchart TD
    A[仕様書を読む] --> B[feature_list.json作成<br/>200個のテストケース]
    B --> C[init.sh作成<br/>環境構築スクリプト]
    C --> D[Git初期化<br/>初回コミット]
    D --> E[プロジェクト構造作成]
    E --> F[claude-progress.txt作成]

重要なルール：

IT IS CATASTROPHIC TO REMOVE OR EDIT FEATURES IN FUTURE SESSIONS.

この強い表現は意図的です。エージェントが勝手にテストケースを削除・編集すると：

機能の抜け漏れが発生
進捗の追跡が不可能に
「完成した」と誤認するリスク

feature_list.jsonの設計：

{
  "category": "functional",  // または "style"
  "description": "テストの説明",
  "steps": [
    "Step 1: ...",
    "Step 2: ...",
    "Step 3: ..."
  ],
  "passes": false  // 唯一変更可能なフィールド
}

category: 機能テストかUIテストかを分類
description: 何をテストするかの説明
steps: テスト手順（詳細であるほど良い）
passes: 合格フラグ（唯一変更可能）

コーディングプロンプト（coding_prompt.md）

## YOUR ROLE - CODING AGENT

You are continuing work on a long-running autonomous development task.
This is a FRESH context window - you have no memory of previous sessions.

### STEP 1: GET YOUR BEARINGS (MANDATORY)

Start by orienting yourself:

```bash
# 1. See your working directory
pwd

# 2. List files to understand project structure
ls -la

# 3. Read the project specification to understand what you're building
cat app_spec.txt

# 4. Read the feature list to see all work
cat feature_list.json | head -50

# 5. Read progress notes from previous sessions
cat claude-progress.txt

# 6. Check recent git history
git log --oneline -20

# 7. Count remaining tests
cat feature_list.json | grep '"passes": false' | wc -l

Understanding the app_spec.txt is critical - it contains the full requirements
for the application you're building.

STEP 2: START SERVERS (IF NOT RUNNING)

If init.sh exists, run it:

chmod +x init.sh
./init.sh

STEP 3: VERIFICATION TEST (CRITICAL!)

MANDATORY BEFORE NEW WORK:

The previous session may have introduced bugs. Before implementing anything
new, you MUST run verification tests.

Run 1-2 of the feature tests marked as "passes": true that are most core
to the app's functionality to verify they still work.

If you find ANY issues (functional or visual):

Mark that feature as "passes": false immediately
Add issues to a list
Fix all issues BEFORE moving to new features
This includes UI bugs like:
- White-on-white text or poor contrast
- Random characters displayed
- Incorrect timestamps
- Layout issues or overflow
- Buttons too close together
- Missing hover states
- Console errors

STEP 4: CHOOSE ONE FEATURE TO IMPLEMENT

Look at feature_list.json and find the highest-priority feature with "passes": false.

Focus on completing one feature perfectly in this session before moving on.
It's ok if you only complete one feature in this session.

STEP 5: IMPLEMENT THE FEATURE

Implement the chosen feature thoroughly:

Write the code (frontend and/or backend as needed)
Test manually using browser automation (see Step 6)
Fix any issues discovered
Verify the feature works end-to-end

STEP 6: VERIFY WITH BROWSER AUTOMATION

CRITICAL: You MUST verify features through the actual UI.

Use browser automation tools:

Navigate to the app in a real browser
Interact like a human user (click, type, scroll)
Take screenshots at each step
Verify both functionality AND visual appearance

DO:

Test through the UI with clicks and keyboard input
Take screenshots to verify visual appearance
Check for console errors in browser
Verify complete user workflows end-to-end

DON'T:

Only test with curl commands (backend testing alone is insufficient)
Use JavaScript evaluation to bypass UI (no shortcuts)
Skip visual verification
Mark tests passing without thorough verification

STEP 7: UPDATE feature_list.json (CAREFULLY!)

YOU CAN ONLY MODIFY ONE FIELD: "passes"

After thorough verification, change:

"passes": false

to:

"passes": true

NEVER:

Remove tests
Edit test descriptions
Modify test steps
Combine or consolidate tests
Reorder tests

STEP 8: COMMIT YOUR PROGRESS

Make a descriptive git commit:

git add .
git commit -m "Implement [feature name] - verified end-to-end

- Added [specific changes]
- Tested with browser automation
- Updated feature_list.json: marked test #X as passing
"

STEP 9: UPDATE PROGRESS NOTES

Update claude-progress.txt with:

What you accomplished this session
Which test(s) you completed
Any issues discovered or fixed
What should be worked on next
Current completion status (e.g., "45/200 tests passing")

STEP 10: END SESSION CLEANLY

Before context fills up:

Commit all working code
Update claude-progress.txt
Update feature_list.json if tests verified
Ensure no uncommitted changes
Leave app in working state (no broken features)


**解説：**

**10ステップの構造：**

| ステップ | 目的 | 重要度 |
|---------|------|--------|
| 1. 状況把握 | 現状を理解する | ★★★ |
| 2. サーバー起動 | 開発環境を準備 | ★★☆ |
| 3. 回帰テスト | 既存機能の確認 | ★★★ |
| 4. 機能選択 | 次のタスクを決定 | ★★☆ |
| 5. 実装 | コードを書く | ★★★ |
| 6. ブラウザ検証 | UIで動作確認 | ★★★ |
| 7. feature_list更新 | 進捗を記録 | ★★★ |
| 8. Gitコミット | 変更を保存 | ★★★ |
| 9. 進捗メモ更新 | 次回への引き継ぎ | ★★☆ |
| 10. クリーン終了 | 安全な状態で終了 | ★★★ |

**STEP 1の詳細：状況把握**

```bash
pwd                                           # どこにいるか
ls -la                                        # 何があるか
cat app_spec.txt                              # 何を作るか
cat feature_list.json | head -50              # 何をテストするか
cat claude-progress.txt                       # 前回何をしたか
git log --oneline -20                         # 履歴は何か
cat feature_list.json | grep '"passes": false' | wc -l  # 残りいくつか

これがフレッシュなコンテキストでの再開を可能にします。

STEP 3：回帰テストの重要性

The previous session may have introduced bugs.

前のセッションが：

バグを混入した可能性
既存機能を壊した可能性
不完全な状態で終了した可能性

これを検出するため、新しい実装前に既存機能をテストします。

STEP 6：ブラウザ自動化の強調

DO:
✅ UIを通じてテスト
✅ スクリーンショットで確認
✅ コンソールエラーをチェック

DON'T:
❌ curlだけでテスト
❌ JavaScriptでUIをバイパス
❌ 視覚的な確認をスキップ

なぜブラウザテストが必須か：

バックエンドが動いてもUIが壊れている可能性
CSSの問題はAPIテストでは検出不可
ユーザー体験はUIを通じてのみ検証可能

プロンプトの設計原則

6. 進捗追跡：progress.py

シンプルですが重要な機能です。

"""
Progress Tracking Utilities
===========================

Functions for tracking and displaying progress of the autonomous coding agent.
"""

import json
from pathlib import Path


def count_passing_tests(project_dir: Path) -> tuple[int, int]:
    """
    feature_list.jsonから合格/総数をカウント。

    Args:
        project_dir: feature_list.jsonを含むディレクトリ

    Returns:
        (合格数, 総数) のタプル
    """
    tests_file = project_dir / "feature_list.json"

    # ファイルが存在しない場合（初回実行前）
    if not tests_file.exists():
        return 0, 0

    try:
        with open(tests_file, "r") as f:
            tests = json.load(f)

        total = len(tests)
        # "passes": true のテストをカウント
        passing = sum(1 for test in tests if test.get("passes", False))

        return passing, total
    except (json.JSONDecodeError, IOError):
        # JSONパースエラーやI/Oエラー時は0を返す
        return 0, 0


def print_session_header(session_num: int, is_initializer: bool) -> None:
    """
    セッション開始時のヘッダーを表示。

    Args:
        session_num: セッション番号
        is_initializer: イニシャライザーセッションかどうか
    """
    session_type = "INITIALIZER" if is_initializer else "CODING AGENT"

    print("\n" + "=" * 70)
    print(f"  SESSION {session_num}: {session_type}")
    print("=" * 70)
    print()


def print_progress_summary(project_dir: Path) -> None:
    """
    現在の進捗サマリーを表示。

    Args:
        project_dir: プロジェクトディレクトリ
    """
    passing, total = count_passing_tests(project_dir)

    if total > 0:
        percentage = (passing / total) * 100
        print(f"\nProgress: {passing}/{total} tests passing ({percentage:.1f}%)")
    else:
        print("\nProgress: feature_list.json not yet created")

解説：

出力例：

======================================================================
  SESSION 5: CODING AGENT
======================================================================

Progress: 45/200 tests passing (22.5%)

なぜ進捗表示が重要か：

ユーザーが進捗を把握できる
エージェント自身も現状を認識
完了までの見積もりが可能
問題発生時の検出（進捗が減少した場合）

7. 実行の流れ

全体の実行フローをまとめます。

8. 実行方法

前提条件

# Claude Codeのインストール
npm install -g @anthropic-ai/claude-code

# 依存関係のインストール
pip install -r requirements.txt
# requirements.txt の内容: claude-code-sdk>=0.0.25

# APIキーの設定
export ANTHROPIC_API_KEY='your-api-key-here'

実行コマンド

# 新規プロジェクトを開始
python autonomous_agent_demo.py --project-dir ./my_project

# 反復回数を制限（テスト用）
python autonomous_agent_demo.py --project-dir ./my_project --max-iterations 3

# 特定のモデルを使用
python autonomous_agent_demo.py --project-dir ./my_project --model claude-sonnet-4-5-20250929

# 既存プロジェクトを継続（同じコマンドを再実行するだけ）
python autonomous_agent_demo.py --project-dir ./my_project

期待される実行時間

フェーズ	時間	内容
初回セッション	10-20分以上	200個のテストケース生成、環境構築
後続セッション	5-15分/回	1-2機能の実装
全機能実装	数時間以上	200機能すべて完成

コスト見積もり

モデル	1セッションあたり	全体（推定）
Claude Sonnet	$0.50-2.00	$50-200
Claude Opus	$2.00-8.00	$200-800

注: 実際のコストはプロジェクトの複雑さにより大きく変動

9. 生成されるプロジェクト構造

エージェントが生成するプロジェクトの構造例：

my_project/
├── feature_list.json       # 200個のテストケース（信頼の源）
├── app_spec.txt            # アプリケーション仕様書
├── init.sh                 # 環境セットアップスクリプト
├── claude-progress.txt     # 進捗メモ（引き継ぎ用）
├── .claude_settings.json   # セキュリティ設定
├── .git/                   # Gitリポジトリ
├── package.json            # Node.js設定
├── README.md               # プロジェクト説明
├── src/                    # ソースコード
│   ├── frontend/           # フロントエンド
│   │   ├── components/
│   │   ├── pages/
│   │   └── styles/
│   └── backend/            # バックエンド
│       ├── routes/
│       ├── models/
│       └── services/
├── public/                 # 静的ファイル
└── tests/                  # テストファイル

各ファイルの役割

ファイル	役割	更新頻度
feature_list.json	テストケース定義	毎セッション（passesのみ）
claude-progress.txt	進捗メモ	毎セッション
init.sh	環境構築スクリプト	初回のみ
app_spec.txt	仕様書	変更なし
.claude_settings.json	セキュリティ設定	変更なし

10. 設計のまとめ

この実装から学べる重要なポイント：

1. 状態の外部化

❌ エージェントの記憶に頼る
   → セッション間で情報が失われる
   → 「何をしたか忘れた」問題

✅ feature_list.json + git + claude-progress.txt に状態を保存
   → 永続的な記録
   → いつでも現状を把握可能
   → 複数のエージェントが協調可能

2. セッション間の独立性

❌ 前のセッションのコンテキストを引き継ぐ
   → コンテキスト爆発
   → メモリ不足
   → 性能低下

✅ 毎回フレッシュなコンテキストで開始
   → 安定した動作
   → 無限に継続可能
   → 予測可能な振る舞い

3. 1セッション1機能

❌ 一度に多くの機能を実装しようとする
   → 中途半端な状態で終了
   → デバッグが困難
   → 進捗が不明確

✅ 1セッションで1機能を確実に完成させる
   → 明確な進捗
   → 動作するコードが増える
   → 問題の切り分けが容易

4. 検証の徹底

❌ コードを書いたら完了
   → 動かないコードが蓄積
   → バグが見つからない
   → 品質低下

✅ ブラウザオートメーションで実際に動作確認
   → ユーザー視点での検証
   → UIの問題も検出
   → 本当に動くコードのみマーク

5. 多層防御のセキュリティ

Layer 1: OSレベルサンドボックス
         → プロセス分離、リソース制限

Layer 2: ファイルパーミッション
         → プロジェクトディレクトリ内のみアクセス可能

Layer 3: コマンドホワイトリスト + 追加検証
         → 許可されたコマンドのみ実行
         → 危険なコマンドは詳細検証

おわりに

Anthropicの公式サンプルコードは、長時間稼働エージェントの設計パターンを学ぶ上で非常に参考になります。

特に重要なのは：

feature_list.jsonを「信頼の源」として設計する
- 200個の詳細なテストケース
- 変更可能なのはpassesフィールドのみ
- 進捗追跡の基盤
プロンプトで行動を厳密に制御する
- 10ステップの明確な手順
- MANDATORY、CRITICALなどの強調
- DO/DON'Tの明示
セキュリティを多層で設計する
- サンドボックス + パーミッション + ホワイトリスト
- 危険なコマンドの追加検証
- 安全側に倒す設計

これらの原則を理解すれば、自分のユースケースに合わせたカスタマイズも容易になります。

ぜひ実際にコードを動かして、長時間稼働エージェントの可能性を体験してみてください。

参考資料

Effective harnesses for long-running agents - Anthropic Engineering Blog
claude-quickstarts/autonomous-coding - GitHub
Claude Code SDK - Anthropic Docs

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up