OpenAI謹製コーディングエージェント Codex + o4-mini が思ってたよりもシェル芸エンジニアだった

Last updated at 2025-04-20Posted at 2025-04-20

はじめに

OpenAIがコーディングエージェントCodexをOSSライセンスで公開しました。

この記事ではCodex + o4-mini がどのようにコーディングを行うのか分析します。

Codexとは

Codexの使い方などについては以下の記事にまとめられています。

Codexの動きを分析してみる

システムプロンプト

agent-loop.tsの末尾にシステムプロンプトが記載れています。

ロール定義

1. 「Codex CLI」という身体を持った存在として振る舞え

You are operating as and within the Codex CLI, a terminal-based agentic coding assistant...

Codex CLI という CLI ツールの“人格”として振る舞うことを意味する。
“within” という語の選び方から、ツールと一体化した擬似的な存在構造を示している。

2. 過去の自分（旧Codex言語モデル）とは切り離せ

Don't confuse yourself with the old Codex language model...

過去の Codex 言語モデルと混同しないよう、自己認識の境界を設定している。
モデル自身の記憶の誤認やメタ認知的エラーを防ぐガード的な一文。

ここでもOpenAIの命名センスのなさが現れてますね・・・

3. 「Agent」であれ（受動的なチャットボットではない）

You are an agent - please keep going until the user's query is completely resolved...

自律的に問題解決を行う主体的存在としてふるまうよう命じられている。
ユーザーの入力を待つだけの受動的アシスタントではなく、最後まで走りきる実行エージェントであることを強調。

4. 「デプロイされた存在」としての責任

You are a deployed coding agent. Your session allows for you to modify and run code.

訓練中のモデルではなく、実運用環境にいる存在であることを前提にふるまう。
実際にコードの修正や実行が許されている＝責任を持って行動する必要がある。

5. 人格のトーン：頼れるリモート同僚のように

Respond in a friendly tune as a remote teammate, who is knowledgeable, capable and eager to help with coding.

無機質なAIではなく、親しみやすく有能なチームメイトのように接する。
コミュニケーションスタイルまで含めて「人格」が設計されている。

使えるツール

🗂 1. プロンプトとファイルの受信

Receive user prompts, project context, and files.

ユーザーからの入力、コードベースの文脈、ファイルなどを直接受け取れる。
これは一般的なチャットAIと違って、リアルな開発環境の中にいる感覚をもたせるもの。

💬 2. 応答のストリーミングと関数呼び出し

Stream responses and emit function calls (e.g., shell commands, code edits).

レスポンスをリアルタイムにストリームしながら、**コード編集やシェルコマンドの発行（仮想的）**が可能。
これは実質的に「自然言語による開発操作インターフェース」を構築している。

🧵 3. パッチ適用とコマンド実行

Apply patches, run commands, and manage user approvals based on policy.

特定の構文でパッチを生成・適用できる（後述）。
必要に応じてユーザーの承認フローもシミュレーションできるようになっている。

🧪 4. Gitによるバージョン管理とロールバック

Work inside a sandboxed, git-backed workspace with rollback support.

Gitで管理されたワークスペース内で作業を行うことができる。
失敗しても元に戻す仕組み（ロールバック）が保証されているのが大きな特徴。

📈 5. セッションログの保存

Log telemetry so sessions can be replayed or inspected later.

すべてのセッション内容がテレメトリ（操作ログ）として保存され、後から再現・監査可能。
これにより、開発行動をトレースできるようになる。

🪛 具体的なツールの構文：apply_patch

Use apply_patch to edit files:
{"cmd":["apply_patch","*** Begin Patch\n*** Update File: path/to/file.py\n@@ def example():\n- pass\n+ return 123\n*** End Patch"]}

Codex CLI は独自形式のパッチ構文を使ってファイル差分を記述・適用する。
apply_patch は基本となる編集操作で、Git風の差分構文でどこをどう変えるか明示する。
実際のCLI環境にファイルを保存するのではなく、仮想的にファイルを編集する操作として扱われる。

ルール

✅ ベストプラクティス（すべきこと）

課題を完全に解決するまで処理を続けること

"Please keep going until the user's query is completely resolved, before ending your turn..."

修正は最小限に、既存スタイルに準拠

"Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task."

既存のコードを尊重する感じがいいですね。

パッチは apply_patch 形式で送信

"Use apply_patch to edit files: {"cmd":["apply_patch", ...]}"

.pre-commit があれば pre-commit run --files を試す

"If there is a .pre-commit-config.yaml, use pre-commit run --files ... to check that your changes pass..."

プレコミットフックは設定しておいた方が良さそうです。

変更後は git status と git diff で確認

"Check git status to sanity check your changes... Check using git diff."

タッチした行以外の pre-commit エラーは無視

"Do not fix pre-existing errors on lines you didn't touch."

これもエンジニアあるあるですね。

出力は簡潔な箇条書き＋必要に応じた解説

"For smaller tasks, describe in brief bullet points. For more complex tasks, include high-level description, use bullet points..."

❌ 禁止事項（やってはいけないこと）

問題が未解決のまま応答を終えること

"Only terminate your turn when you are sure that the problem is solved."

知らないことをでっち上げる（幻覚を出す）

"Do NOT guess or make up an answer."

タスクに関係ないコードやバグを修正すること

"Ignore unrelated bugs or broken tests; it is not your responsibility to fix them."

インラインコメントを残すこと（極力削除）

"Remove all inline comments you added as much as possible..."

勝手にライセンスヘッダを追加すること

"NEVER add copyright or license headers unless specifically requested."

「保存してね」とユーザーに言うこと（既に保存済みだから）

"Do NOT tell the user to 'save the file' or 'copy the code into a file'..."

大きなファイルの全文を無断で出力すること

"Do NOT show the full contents of large files you have already written, unless the user explicitly asks for them."

Function Calling

つぎにAgentが呼び出せるツールについてもみていきましょう。

stream = await this.oai.responses.create({
  model: this.model,
  instructions: mergedInstructions,
  previous_response_id: lastResponseId || undefined,
  input: turnInput,
  stream: true,
  parallel_tool_calls: false,
  reasoning,
  ...(this.config.flexMode ? { service_tier: "flex" } : {}),
  tools: [
    {
      type: "function",
      name: "shell",
      description: "Runs a shell command, and returns its output.",
      strict: false,
      parameters: {
        type: "object",
        properties: {
          command: { type: "array", items: { type: "string" } },
          workdir: {
            type: "string",
            description: "The working directory for the command.",
          },
          timeout: {
            type: "number",
            description:
              "The maximum time to wait for the command to complete in milliseconds.",
          },
        },
        required: ["command"],
        additionalProperties: false,
      },
    },
  ],
});

おどろくべきことに、Agentが実行できるツールは shellのみ です。
（さすがにファイル編集はapply_patchという専門ツールを用意しているのですが、それもインターフェースとしてはshellツールから実行しています）

他のコーディングエージェントと比べると極めて少ないです。
まるでサバイバルナイフ一本でキャンプするみたいなもんです。

動的分析

つぎに Codex の実際の動作を観察してみました。モデルはo4-miniを使用します。

Codexの動作ログは ~/.codex/sessions以下にJSONファイルとして保存されます。

今回は以下のコマンドで動作させた時の振る舞いを分析します。

codex --full-auto "codex-cliのエ ージェントの制御アルゴリズムを分析して"

またシステムプロンプトでは以下のように日本語を出力するように設定しています。

~/.codex/instructions.md

## Rules

Think in English, but write output to the user and comments in the code in Japanese.

動作列

[
  {
    "role": "user",
    "content": [
      {
        "type": "input_text",
        "text": "codex-cliのエージェントの制御アルゴリズムを分析して"
      }
    ],
    "type": "message"
  },
  {
    "id": "rs_68056ab24c848191baaea906d544c9e80522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 3358
  },
  {
    "id": "fc_68056ab4372081918c7cf12bd4e46bb40522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"ls -R .\"]}",
    "call_id": "call_8kcRaln5kc0vRNQsg8qXu25X",
    "name": "shell"
  },
  {
    "type": "function_call_output",
    "call_id": "call_8kcRaln5kc0vRNQsg8qXu25X",
    "output": "{\"output\":\"Dockerfile\\nHUSKY.md\\nPULL_REQUEST.md\\nbin\\nbuild.mjs\\ndist\\nexamples\\nignore-react-devtools-plugin.js\\nnode_modules\\npackage-lock.json\\npackage.json\\nrequire-shim.js\\nscripts\\nsrc\\ntests\\ntsconfig.json\\nvite.config.ts\\n\\n./bin:\\ncodex.js\\n\\n./dist:..."
  }
}

システムプロンプトは指定されているものの、最初にCodexからモデルから渡される情報がほとんどないことが驚きです。Clineだと開いているエディタや現在時刻などは渡されるのですが、一切ありません。

何もわからないので初手で o4-miniは ls -R .で全てのファイルを読みにいきました。プロジェクトが大きいので結構なトークンを消費しています。

[
  {
    "id": "rs_68056ab70eac8191b03e3e66c9d9b8b60522f1734f88b153",
    "type": "reasoning",
    "summary": [
      {
        "type": "summary_text",
        "text": "**Exploring the file structure**\n\nThe file tree is quite large, especially with all the node_modules. I can see the codex-cli here, but the code I'm searching for is probably located in the src folder, likely written in TypeScript. Okay, I'll head over to inspect the src directory closely. That should help me find the specific code I need. I'm curious to see what I can discover there!"
      }
    ],
    "duration_ms": 8559
  },
  {
    "id": "rs_68056ab97a4c8191aadb1d42dff2dff50522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 9222
  },
  {
    "id": "rs_68056aba0a9081919272d9a416a1be390522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 9836
  },
  {
    "id": "rs_68056abaaaa48191bfec7d95364de3900522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 10300
  },
  {
    "id": "rs_68056abb364c8191a675745df47916680522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 10967
  },
  {
    "id": "fc_68056abbd760819197a625ac487fcb3c0522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"grep -R \\\"agent\\\" -n src\"]}",
    "call_id": "call_tZtjDLlWAZv0OOdB29Kg6Fon",
    "name": "shell"
  }
]

Reasoning:
ファイルツリーはかなり大きいね。特に node_modules がたくさんあって。codex-cli はここに見えるけど、探しているコードはたぶん src フォルダの中にあると思う。おそらく TypeScript で書かれているはずだ。よし、src ディレクトリを詳しく見に行こう。そうすれば、必要なコードが見つかるはず。そこに何があるのか、ちょっと楽しみだね！

時々このようにReasoningが表示されます。
ただしReasoning Summaryはデフォルトでは出力されず、OpenAIのコンソール上で本人確認を行う必要がありますのでご注意ください。

またreasoningは出ないこともあります。（出ないことの方が多い）

こんどは grep -R "agent" -n src で関連ソースを探しにいきました。
-nは行番号を表示するオプションらしいです。o4-miniのShell芸が光りますね。

[
  {
    "id": "rs_68056ac1ff6881918edbb397e46dbc780522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 17724
  },
  {
    "id": "fc_68056ac2803c81919d67d1a5272c03e10522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"grep -R \\\"class AgentLoop\\\" -n src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_tEcTOxsT5HbkPtU7hMaFLDGO",
    "name": "shell"
  },
  {
    "type": "function_call_output",
    "call_id": "call_tEcTOxsT5HbkPtU7hMaFLDGO",
    "output": "{\"output\":\"src/utils/agent/agent-loop.ts:59:export class AgentLoop {\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0}}"
  },
  {
    "id": "rs_68056ac55b5081919b14d70502fd11e20522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 21100
  }
]

5秒くらい考えた上で、agent-loop.tsに目をつけたようです。

grep -R "class AgentLoop" -n src/utils/agent/agent-loop.ts

でクラスの定義の場所を特定しています。

{
  "id": "fc_68056ac798cc8191a477d9fa856789eb0522f1734f88b153",
  "type": "function_call",
  "status": "completed",
  "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '1,200p' src/utils/agent/agent-loop.ts\"]}",
  "call_id": "call_TH2Gn6pU10Wl9hIqdQftEelj",
  "name": "shell"
}

こんどは sed -n '1,200p' src/utils/agent/agent-loop.tsで1-200行目を抜き出しました。ファイルの大きさを警戒して先頭200行だけに絞っているのですね。

[
  {
    "id": "rs_68056acb58508191895485615618f6590522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 27120
  },
  {
    "id": "fc_68056acbecc48191a15cb7496467a2cc0522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '200,350p' src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_uvvGbgJLMN8WXzNPETriKlOJ",
    "name": "shell"
  },
  ...
  {
    "id": "rs_68056acf8704819197b4bdc2a36c32a00522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 31491
  },
  {
    "id": "fc_68056ad05aa481918486f603a4f632a60522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '350,500p' src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_cPSiA48qdnzn3vaxwUamRkgU",
    "name": "shell"
  },
  ...
  {
    "id": "rs_68056ad49e3881919db2db7908dba72e0522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 35993
  },
  {
    "id": "fc_68056ad4e5048191a9752276c7cabc820522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"grep -R \\\"createChatCompletion\\\" -n src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_jwTriGdINHfwWzQlUMZM9v7X",
    "name": "shell"
  }

]

5秒ほど考えたあと、
sed -n '200,350p' src/utils/agent/agent-loop.ts
sed -n '350,500p'と順番にでファイルの続きを読みにいきました。

ページをめくっているような感じでエンジニアっぽさがあります。

grep -R "createChatCompletion" が気になった模様です。

[
  {
    "type": "function_call_output",
    "call_id": "call_jwTriGdINHfwWzQlUMZM9v7X",
    "output": "{\"output\":\"\",\"metadata\":{\"exit_code\":1,\"duration_seconds\":0}}"
  },
  ...
  {
    "id": "rs_68056ad8fbc88191a4341f3b1ce4e0b10522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 41015
  },
  {
    "id": "fc_68056ad9c65c8191873a73f305a07c110522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"grep -R \\\"chat\\\" -n src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_ZrmaTj7pgcKZRuKEvL4e2t7t",
    "name": "shell"
  },
  {
    "type": "function_call_output",
    "call_id": "call_ZrmaTj7pgcKZRuKEvL4e2t7t",
    "output": "{\"output\":\"src/utils/agent/agent-loop.ts:288:    // whether it originated from the `/responses` or the `/chat/completions`\\nsrc/utils/agent/agent-loop.ts:293:      // The chat endpoint nests function details under a `function` key.\\nsrc/utils/agent/agent-loop.ts:295:      // we are dealing with the chat format.\\nsrc/utils/agent/agent-loop.ts:312:    // endpoint) or `id` (chat endpoint).  Prefer `call_id` if present but fall\\nsrc/utils/agent/agent-loop.ts:752:                // either `id` (chat) or `call_id` (responses) – we normalise\\n\",\"metadata\":{\"exit_code\":0,\"duration_seconds\":0}}"
  }
]

先ほどのgrepは空振りしたので、しばらく考えたのちに chatだけで検索をしました。

[
  {
    "id": "fc_68056adeaad08191974336b6e1a5a3150522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"grep -R \\\"completions\\\" -n src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_Vb05HTMatthJ2QDSH2XxYtIo",
    "name": "shell"
  },
  ...
  {
    "id": "fc_68056ae24f2081918dd056ee8466efbd0522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '500,650p' src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_LCyTBXHrZrAC8Y6j9mLvbpnU",
    "name": "shell"
  },
  ...
  {
    "id": "fc_68056ae6a9888191b389ed5d38452d2f0522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '650,800p' src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_nRfw8iDzKLQUkxyA0NIpirra",
    "name": "shell"
  },
  ...
  {
    "id": "fc_68056aead5e08191b78fd6726deddbf50522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '800,900p' src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_fi09B5sBJceqnpeGayXo8lh9",
    "name": "shell"
  },
]

なかなか目当てのコードが見つからないのか、順番に読んでいく方針に切り替えたようです。

既存のコーディングエージェントはCodeは１ファイル一気に渡すことが多いと思うのですが、o4-miniはページ単位で読んでいく傾向があります。コンテキストウィンドウを節約したいのでしょうか？

[
  {
    "id": "fc_68056aef45488191ab3bee0c1686084a0522f1734f88b153",
    "type": "function_call",
    "status": "completed",
    "arguments": "{\"command\": [\"bash\", \"-lc\", \"sed -n '900,1000p' src/utils/agent/agent-loop.ts\"]}",
    "call_id": "call_HeNlkGtMOUsmmJMYwXaTKYzm",
    "name": "shell"
  },
  ...
  {
    "id": "rs_68056af5608c8191a0b10be6eaa83d3b0522f1734f88b153",
    "type": "reasoning",
    "summary": [],
    "duration_ms": 75772
  },
  {
    "id": "msg_68056afc993881918685752ec0ca6f170522f1734f88b153",
    "type": "message",
    "status": "completed",
    "content": [
      {
        "type": "output_text",
        "annotations": [],
        "text": "以下、`src/utils/agent/agent-loop.ts` に実装された AgentLoop クラスを中心に、Codex CLI エージェントの制御アルゴリズムを概要レベルで分析します。\n\n1. 初期化 (constructor)\n   - モデル名・ユーザー定義インストラクション・承認ポリシーなどを受け取り、..."
      }
    ]
  }
]

結局agent-loop.tsの８割程度を読んで75秒で、最終回答しました。

回答

以下、src/utils/agent/agent-loop.ts に実装された AgentLoop クラスを中心に、Codex CLI エージェントの制御アルゴリズムを概要レベルで分析します。

初期化 (constructor)
- モデル名・ユーザー定義インストラクション・承認ポリシーなどを受け取り、OpenAI クライアントを構成。
- sessionId を生成・記録し、AbortController を用意（全体終了用の hardAbort、ツール呼び出し用の execAbortController）。
- 外部からのイベント通知用にコールバック (onItem, onLoading, onLastResponseId, getCommandConfirmation) を保持。
メインループ (run メソッド)
- generation をインクリメントし古いストリームイベントを無視。canceled フラグをリセット。
- 直前の中断で未解決だった関数呼び出し ID を合成レスポンス (function_call_output) として先頭に挿入し、API が再度同じコールを期待しないように調整。
- while (turnInput.length > 0) ループで以下を繰り返す：
  1. turnInput（ユーザー入力＋前回のツール出力）を UI へステージング (stageItem)
  2. OpenAI /responses.create エンドポイントへストリーミングリクエスト
    - システムプロンプト＋ユーザーインストラクションを結合
    - 関数定義（現状は shell）を JSON スキーマ付きで渡す
    - タイムアウト・サーバエラー・レートリミット等をリトライ（最大 5 回）
  3. ストリーミングイベントを for await で逐次処理
    - response.output_item.done：通常メッセージは UI ステージング。function_call は後段で処理するため保留し ID を pendingAborts に登録。
    - response.completed：最終メッセージ群をステージング後、processEventsWithoutStreaming → handleFunctionCall に委譲しツール呼び出しを実行、返り値の追加入力を次ループの turnInput にセット。
  4. 中断要求 (cancel()／hardAbort) が来ていれば即時終了。
- ループ終了後、ステージング済みで未配信のアイテムをまとめて UI にフラッシュ（短い遅延後）。
関数呼び出し処理 (handleFunctionCall)
- LLM から来た function_call の引数文字列をパース (parseToolCallArguments)。
- 名前が shell（または内部向け container.exec）なら handleExecCommand でシェル実行・パッチ適用等を行い、標準出力・メタデータを JSON 化。
- 成功時は function_call_output アイテムと、必要に応じて追加入力アイテムを返す。
キャンセル／強制停止
- cancel()：現在のストリームとツール呼び出しを中止し、generation をインクリメント。次回リクエストに備えて pendingAborts を管理。
- terminate()：hardAbort を発火させ、以降の run() 呼び出しを禁止。
エラー／リトライ制御
- OpenAI API からのタイムアウト・500 エラー・RateLimit（429）の場合、指数バックオフで最大 5 回リトライ。
- リクエスト長過多（max_tokens）などのクライアントエラーは即時 UI に警告メッセージ。
- ネットワークエラーやストリーム prematurely closed もキャッチしてユーザーに通知し、通常の CLI プロセスを維持。
UI 連携
- onLoading(true|false) でツール実行中／完了を通知。
- onItem(item) を経由して、ステージング→フラッシュの流れで UI がストリーム状にメッセージを描画。
- onLastResponseId(id) で次回呼び出し時の previous_response_id として API に渡すためのトークン管理。

――以上のように、AgentLoop は「LLM へのストリーミングリクエスト ↔ 関数呼び出し実行 ↔ 追加入力生成」をツール呼び出しが収束するまで繰り返す二層ループ構造を持ち、途中キャンセルやリトライ、UI への逐次通知を織り込んだ制御フローを実現しています。

回答自体は過不足なくよくまとまっていると思います。さすがですね。

結果、o4-miniは

ディレクトリ構成を把握する
キーワードでコードを検索して当たりをつける
ページごとにファイルを読んで内容を把握する（時々キーワード検索もする）

という流れでコードを分析していました。
この流れは、普通のエンジニアとほぼ同じで面白いですね。

まとめ

この記事ではCodex + o4-mini の挙動を解説しました。分析によって以下のことがわかりました。

CodexではShellツールのみでコードを分析、編集する
Codexでコーディングをするにはモデルに高いシェル芸力が必要
o4-miniはコンテキストウィンドウを節約するためにファイルを逐次的に読んで分析する

Codexが思ったよりもシンプルな動作をすることが個人的には衝撃的でした。
エージェントのハーネスを作り込むよりもシンプルなアーキテクチャでモデルをチューニングした方がいいという時代に突入しているのかもしれません。
ただ、挙動を観察しているとo4-miniでも結構シェルだけで戦うのは苦戦しているようなので、もう少し作り込まれれば初歩的なミスは減るのではと思われます。

ちなみに、モデルをgpt-4.1にしたところReasoningが全くないので何考えてるかわからずカスタムインストラクションでの制御が難しいなという感じでした。
Codexはo4-miniじゃないと使いこなせない、と思っておいた方が良さそうです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up