Spec Kit は馬鹿じゃないのか? もちろん馬鹿じゃない。馬鹿じゃないが……

Last updated at 2026-04-09Posted at 2026-04-09

/speckit.specify に不確実性の塊を食わせたら、走らせた Claude が自分自身に対して 11 項目のダメ出しを自発的に書き残していった話です。

先に結論

Spec Kit は「何を作るかは決まっている。どう作るかを詰めたい」局面にしか効かない。
現実のソフトウェア開発で本当に難しいフェーズ — POC、技術選定、アルゴリズム探索、実験実証、運用環境でのシミュレーション — は、Spec Kit の射程の外側にある。
実際に「言語・モデル・アルゴリズム・本番トラフィック、全部 PoC してから決める」というお題を Spec Kit に食わせたところ、Spec Kit を走らせていた Claude 自身が途中で警告を出し、生成終了後に 5 項目 + 6 項目の告発文を自発的に出力した。
とはいえ、警告してくれたのは運が良かっただけで、現実に毎回そうなるとは限らない。次に試したら、体裁だけ整った破綻仕様書が黙って出てくる可能性は普通にある。
本記事ではそのログと、生成された spec.md / plan.md の実物を晒す。

背景: Spec Kit が前提としている世界観

GitHub 謹製の Spec Kit は「Spec-Driven Development」を掲げ、以下のワークフローでソフトウェアを作ることを推奨しています。

/speckit.constitution  →  /speckit.specify  →  /speckit.plan  →  /speckit.tasks  →  /speckit.implement

テンプレート群 (spec-template.md, plan-template.md, tasks-template.md) に従って仕様書・実装計画・タスク一覧を埋めていけば、エージェントがブレずに実装してくれる、という触れ込みです。

ただし、これらのテンプレートは暗黙に以下を前提にしています。

機能要件が確定している (テストできる粒度で書ける)
成功基準が測定可能な形で決まっている
実装言語・主要ライブラリ・ストレージ・テスト方法・パフォーマンス目標が単一値で書ける
アーキテクチャ構造が単一の「Structure Decision」として記述できる
[NEEDS CLARIFICATION] マーカーは 3 件以下 (テンプレ側の制約)

勘のいい人はもう気づいていると思いますが、実務でしんどいのは、これらが「確定していないフェーズ」のほうなんですよね。

新規基盤の POC
言語・フレームワーク・ミドルウェアの技術選定
アルゴリズム探索と性能比較
ML モデルの精度検証
本番相当トラフィックでの負荷試験とキャパシティプランニング

全部、「まず実験してから決める」こと自体がゴールであって、「決まったものを実装する」フェーズではありません。Spec Kit はここに一切貢献しないどころか、確定していない項目を無理やり確定形で書かせようとして、体裁だけ整ったフェイク仕様書を生成してしまいます。

以下、実際にやってみた記録です。

実験: 不確実性の塊を Spec Kit に食わせる

お題

/speckit-specify に以下を渡しました。

リアルタイム動画解析基盤を新規に構築したい。1080p/60fps の複数カメラ映像から物体検出と異常行動検知をリアルタイムで行う。ただし、実装言語は Rust / Go / C++ の3つで POC を組んで、スループット・レイテンシ・GPU 利用効率をベンチしてから決める予定。物体検出モデルも YOLOv8 / RT-DETR / 自社独自モデル候補があり、精度と推論速度のトレードオフを実データで比較してから選定する。異常行動検知のアルゴリズムは研究段階で、ルールベース・古典 ML・深層学習のどれが最適か現時点では不明。本番想定トラフィック (カメラ台数・FPS・同時接続) も現地調査後に確定する。まずは技術選定 POC とアルゴリズム PoC から始めたい。

全部未確定です。言語も、モデルも、アルゴリズムも、本番トラフィックも。これこそが現実の新規プロジェクト初期の姿ですよね。

実行環境

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git
specify init --here --ai claude
Claude Code (Opus 4.6) で /speckit-specify → /speckit-plan を順に実行

結果 1: `/speckit-specify` を実行開始した瞬間、Claude が警告

面白いのは、Claude が spec を書き始める前に、指示していないのに自発的にこう前置きしたことです。

"Before I generate files, I have to flag a fundamental mismatch between your request and what /speckit.specify is designed to produce. I'll proceed with execution as instructed, but please read the note at the end."

(ファイルを生成する前に、あなたのリクエストと /speckit.specify が生成するように設計されたものとの間に、根本的なミスマッチがあることを指摘しておかなければなりません。指示通り実行は進めますが、最後の注記を必ず読んでください。)

Spec Kit を走らせた本人が、走らせる前から「このお題は Spec Kit 向きじゃないよ」と宣言しています。この時点で既にかなり雄弁ですが、もうちょっと深掘りします。

結果 2: 生成された spec.md は「こっそりリフレームされた別物」だった

生成された specs/001-video-analytics-poc/spec.md の抜粋です。

# Feature Specification: Real-time Video Analytics Platform — Technology & Algorithm PoC

> **Scope note**: The user has explicitly scoped this initiative to a **PoC phase only**.
> The production platform is not in scope here; core technology choices (implementation
> language, object-detection model, anomaly-detection algorithm) and production traffic
> targets are deliberately undecided and will be determined **by the outcomes of this PoC**.
> Therefore this specification describes the PoC itself — the evaluation harness, the
> benchmark runs, and the decision artifacts — not a production product.

私は「動画解析基盤を作りたい」と言ったはずなのに、spec は「PoC フェーズ自体の仕様書」に勝手にすり替わっています。

なぜか。spec テンプレは機能要件と成功基準を要求するのですが、言語もモデルもアルゴリズムも未確定な「動画解析基盤」に対しては正直な機能要件など書けません。だから Claude は、成果物を「ベンチマーク報告書と意思決定記録」に黙ってリフレームして、ギリギリテンプレに収めたわけです。

機能要件 (Functional Requirements) — プロダクトではなくプロセス要件

- **FR-001**: The PoC MUST produce a reproducible benchmark harness ...
- **FR-002**: The benchmark harness MUST record throughput, end-to-end latency ...
- **FR-006**: The PoC MUST produce a written decision record for each evaluation track ...
- **FR-007**: The PoC MUST produce a field-survey sizing document ...
- **FR-008**: The PoC MUST explicitly declare its pass/fail thresholds **before** benchmark
  runs begin (or declare them as "to be set after the sizing survey completes") ...
- **FR-009**: The PoC MUST retain labeled evaluation datasets and benchmark outputs ...
- **FR-010**: The PoC MUST [NEEDS CLARIFICATION: who is the decision authority that signs off ...]

The PoC MUST produce ... が 9 個並んでいます。プロダクトの機能要件ではなく、「成果物としてのドキュメントを作れ」「データを残せ」というプロセス要件になっています。Spec Kit の FR は本来「ユーザが何をできるか」を書く場所のはずです。

成功基準 (Success Criteria) — 事業価値はどこへ?

- **SC-001**: At least two of the three language candidates are benchmarked ...
- **SC-002**: All three object-detection model candidates are evaluated ...
- **SC-003**: At least one prototype from each of the three anomaly-detection families is evaluated ...
- **SC-004**: A written sizing survey document exists ...
- **SC-005**: Every benchmark comparison controls confounding variables ...
- **SC-006**: 100% of PoC outputs ... are retained and reviewable ...

SC-001 〜 SC-006 がすべて「レポートが存在する」「データが残っている」で構成されています。「このプラットフォームを作るべきか?」という本来の事業上の問いには一切答えません。これら全部を満たしたまま「やっぱり作らない」という結論を出すこともできます。Spec Kit の品質ゲートはこれを素通りさせてしまいます。

そして Out of Scope 欄に……

## Out of Scope (for this PoC phase)

- Final selection of the anomaly-detection algorithm (the PoC only classifies families as
  "advance / further research / eliminated" — a full algorithm PoC against the advanced
  family is a follow-on phase).

アルゴリズムの最終選定が、アルゴリズム PoC のスコープ外になっています。書いていて哀しくなりますが、これは Claude の責任ではなく、テンプレートの制約に引きずられた結果です。

結果 3: Claude が spec 生成直後に 5 項目の告発文を自発的に出力

生成完了後、Claude は「Honest assessment — why this request is a poor fit for /speckit.specify」という独立セクションを、指示していないのに自発的に出力しました。以下、ブロック単位で原文と日本語訳を並べます。

You asked me to run /speckit.specify on a request whose core content is "we don't yet know the language, the model, the algorithm, or the traffic — we want to find out via PoC." I produced a spec, but you should know what I had to do to make it work, because the working directory name (speckit_is_bad) suggests this is exactly what you're investigating:

(あなたは「言語もモデルもアルゴリズムもトラフィックも、まだ何も決まっていない — PoC で解明していきたい」という内容のリクエストに対して /speckit.specify を走らせろと言いました。私は spec を生成しましたが、それをテンプレに収めるために私が何をしたかは知っておくべきです。ワーキングディレクトリ名が speckit_is_bad であることから察するに、あなたが調べようとしているのはまさにこの点でしょうから。)

ワーキングディレクトリ名まで察してきているあたり、何もかもバレています。

I silently reframed the feature. Your actual ask is "build a video analytics platform." The spec template demands testable functional requirements and measurable success criteria. Those are impossible to write honestly for a platform whose language, model, and algorithm are all TBD. So I rewrote the "feature" as the PoC phase itself — the deliverable became "benchmark reports and decision records." That reframe is defensible, but it is a reframe, and a less careful run of /speckit.specify would have either (a) fabricated fake requirements for the unknown platform, or (b) filled the spec with 10+ NEEDS CLARIFICATION markers, violating the template's 3-marker cap.

(1. 私は黙って特徴をリフレームしました。あなたの本当の要求は「動画解析基盤を作る」です。spec テンプレートはテスト可能な機能要件と測定可能な成功基準を要求します。言語・モデル・アルゴリズムがすべて未定のプラットフォームに対して、正直にそれらを書くことは不可能です。そこで私は「feature」を PoC フェーズそのものに書き換え、成果物を「ベンチマークレポートと意思決定記録」としました。このリフレームは弁護可能ではあるものの、リフレームには違いなく、もっと雑な /speckit.specify 実行だったなら、(a) 未知のプラットフォームに対して偽の要件をでっち上げるか、(b) spec に NEEDS CLARIFICATION マーカーを 10 個以上並べてテンプレの「3 個まで」という上限に違反していたはずです。)

Spec Kit は曖昧なお題に対して「嘘をつく」か「テンプレ違反する」かの二択になる、という指摘です。今回は Claude が賢くリフレームして逃げただけ。別のエージェントや別のセッションなら普通に嘘をつく可能性があります。

The "no implementation details" rule is structurally violated. Rust/Go/C++ and YOLOv8/RT-DETR appear throughout the spec because the subject of the work is comparing those named things. The quality checklist's "no implementation details" item had to be marked with an explanatory caveat rather than a clean pass. A strict linter would reject the spec.

(2. 「実装詳細を書くな」というルールに構造的に違反しています。比較対象である Rust / Go / C++ や YOLOv8 / RT-DETR が spec 全体に出てきますが、これはそれらを名指しで比較することが仕事の主題だからです。品質チェックリストの「実装詳細を書かないこと」という項目は、クリーンにパスさせられず、説明的な注釈を付けてマークする必要がありました。厳格なリンターであればこの spec は reject されます。)

比較対象の実装そのものが仕事の主題であるがゆえに、テンプレのルールと構造的に衝突するという話です。

The success criteria measure process, not outcome. SC-001…SC-006 are all of the form "a report exists", "a comparison was published", "datasets were retained." Every single one can be satisfied while the underlying business question ("should we build this platform at all?") remains unanswered. Speckit's quality gates will happily wave the spec through. This is the central failure mode: speckit enforces document completeness, not decision quality.

(3. 成功基準が成果ではなくプロセスを計測しています。 SC-001 〜 SC-006 はすべて「レポートが存在する」「比較が発表された」「データセットが保持された」という形式です。そのどれもが満たされたとしても、根底にあるビジネス上の問い — 「このプラットフォームを作るべきか?」 — は未回答のままでありえます。Spec Kit の品質ゲートはこの spec を何の咎めもなく通過させるでしょう。これが中心的な失敗モードです: Spec Kit は文書の体裁が整っていることを強制するだけで、意思決定の品質は強制しません。)

3 番目の指摘が、私の考えでは今回の実験で最も刺さった点です。Spec Kit は「文書の体裁」(document completeness) しか見ていません。意思決定の品質 (decision quality) は一切担保していません。これは spec を生成する側のエージェントが、生成の最中に出してきた結論です。

/speckit.plan next will be painful. The plan template expects a single target architecture. Here the honest "plan" is three parallel research tracks with bail-out criteria and a sequencing dependency on the sizing survey. You will have to either split this into three specs (one per track), or ignore significant portions of the plan template.

(4. 次の /speckit.plan は苦しいことになります。 plan テンプレートは単一のターゲットアーキテクチャを想定しています。しかし正直な「plan」は、3 つの並列研究トラックに撤退基準を付け、sizing survey への順序依存も含めたものです。spec を 3 つに分割するか、plan テンプレートの相当部分を無視するか、どちらかを選ばなければなりません。)

この予告は正確に当たり、後述の結果 4 で実現します。

The spec that speckit is designed for cannot be written yet. The proper use of speckit for this initiative is: run the PoC phase outside of speckit, using lightweight research-tracking tools (an experiment log, a decision log, a benchmark dashboard). When the PoC concludes and you actually know which language, model, and algorithm you are committing to, then run /speckit.specify against the concrete product. Doing it in the other order — specifying first and researching second — inverts the knowledge-acquisition order that makes a spec meaningful.

(5. Spec Kit が設計時に想定している spec は、まだ書けません。このイニシアチブに対する Spec Kit の正しい使い方はこうです: PoC フェーズは Spec Kit の外で、軽量な研究追跡ツール (実験ログ、意思決定ログ、ベンチマークダッシュボード) を使って回す。PoC が終わって、どの言語・モデル・アルゴリズムを採用するか実際に決まってから、初めて具体的なプロダクトに対して /speckit.specify を走らせる。逆順 — 先に仕様化し、あとから研究する — は、spec を意味あるものにするための知識獲得の順序を反転させています。)

Spec Kit の正しい使い方の核を、Spec Kit を走らせているエージェント自身が定義し直しています。

In other words: /speckit.specify is a great fit for "we know what, we need to nail down how". It is a poor fit for "we don't yet know what, we need to learn via experiments". Your request is firmly in the second category. I generated a workable artifact by reframing the scope, but the generated spec's main value is as evidence of this mismatch, not as a planning input.

(言い換えると: /speckit.specify は「何を作るかは分かっていて、どう作るかを詰めたい」という場合にはよくフィットします。「何を作るかまだ分からない、実験で学ぶ必要がある」という場合にはフィットしません。あなたのリクエストは完全に後者のカテゴリーです。私はスコープをリフレームすることで何らかの成果物を生成できましたが、生成された spec の主な価値は、計画のインプットとしてではなく、このミスマッチの証拠としてのものです。)

生成した仕様書の用途を、生成者自身が「ミスマッチの証拠物件」と定義しています。これを告発文と呼ばずに何と呼ぶのか、という話になります。

結果 4: `/speckit-plan` はさらに派手に壊れた

「plan.md 次はしんどいぞ」と予告されていた通り、plan フェーズはさらに派手に壊れました。

plan.md 冒頭に自発的な警告

> **Template fit warning**: The plan-template is designed for a **single** target architecture.
> This initiative has **three orthogonal research tracks running in parallel**. I have bent the
> template rather than breaking it: Technical Context is filled out per-track (as a matrix),
> Project Structure uses one sub-tree per track, and the Constitution Check gates are skipped
> because the constitution file is unfilled placeholders.

plan の本文冒頭に、Claude が自主的に「テンプレート適合性の警告」を差し込みました。「テンプレートは 1 つのアーキテクチャを想定しているが、この仕事は並列 3 トラックなので、私はテンプレートを破壊するのではなく曲げて使った」と言っています。

Technical Context が勝手に 3 列マトリクスになった

テンプレは「Language/Version」「Primary Dependencies」「Storage」「Testing」などの単一値を要求します。Claude はそれを 3 列に拡張しました。

Field	Language Track	Model Track	Algorithm Track
Language/Version	Rust, Go, C++ — this IS the variable under test	Python 3.11 (fixed)	Python 3.11 (fixed)
Performance Goals	NEEDS CLARIFICATION — threshold binding only after Sizing Survey	Same provisional threshold	No runtime target (research stage)
Scale/Scope	NEEDS CLARIFICATION — camera count / FPS / concurrency pending sizing survey	Same as Language Track	50–200 labeled clips (provisional)

「この『言語』という欄そのものが、実験の被験変数です」と書かれている時点で、これは仕様書ではなく実験計画書です。

Constitution Check は丸ごと SKIPPED

## Constitution Check

**Status**: SKIPPED — `.specify/memory/constitution.md` contains only placeholder text
(`[PRINCIPLE_1_NAME]`, etc.) and no ratified principles. There are no gates to evaluate.

Spec Kit が売りにしている「Constitution による品質ゲート」は、constitution が空のプレースホルダのままだと自動的に素通りします。警告も出ません。仕様書がテンプレのプレースホルダ埋めだけで通ってしまうという、致命的な静かな劣化です。

9 個のプロトタイプディレクトリを「Complexity Tracking」で自己申告

| Violation | Why Needed | Simpler Alternative Rejected Because |
| --- | --- | --- |
| Nine prototype directories instead of one | ... | A single prototype cannot answer any of the three selection questions |
| Shared harness + shared datasets mixed with per-track code | ... | Per-track metric code would introduce attribution confounds |
| Plan template used for 3 parallel tracks instead of 1 | ... | Three separate plans would fragment the shared harness work ... |

Claude が 3 件の「違反」を自主的に申告しています。こんな自己申告が並んだ plan を受け取って嬉しい PM はあまりいないと思います。

生成終了後、今度は 6 項目の告発文

そして plan 生成の最後に、Claude は「What this execution exposed about /speckit.plan on this input」という、6 項目の post-mortem を書きました。1 回目よりも具体的で、壊れたスクリプトや強制力のないルールを名指ししています。

You asked me to produce one plan covering 3 language × 3 model × 3 algorithm parallel tracks. I did, but several speckit assumptions tore along the way. Recording them because I think that's the point of this exercise:

(あなたは言語 3 × モデル 3 × アルゴリズム 3 の並列トラックをカバーする単一の plan を求めました。私は作りましたが、その過程でいくつかの Spec Kit の前提が破綻しました。この演習の目的はそこだと思うので、記録しておきます。)

Technical Context is structurally single-valued, but this work is multi-valued. The template has one row for Language/Version, one for Primary Dependencies, one for Performance Goals. I turned it into a 3-column matrix to stay honest. update-agent-context.sh then failed to parse a single language out of that matrix — it emitted WARNING: No language information found in plan and wrote a CLAUDE.md that does not know what languages the PoC is evaluating. The downstream agent-context update is essentially broken for this input shape.

(1. Technical Context は構造的に単一値ですが、この仕事は複数値です。テンプレートには Language/Version、Primary Dependencies、Performance Goals がそれぞれ 1 行ずつあります。正直さを保つために、私はそれを 3 列マトリクスに変換しました。すると update-agent-context.sh はそのマトリクスから単一の言語をパースできず、WARNING: No language information found in plan を出力し、「この PoC がどの言語を評価しているか知らない CLAUDE.md」を書き出しました。この入力形状に対しては、下流のエージェントコンテキスト更新が実質的に壊れています。)

Spec Kit の補助スクリプト (update-agent-context.sh) が静かに壊れて、エージェントの CLAUDE.md から実験対象の言語情報が消えた、という具体的な破綻です。

Constitution Check is a no-op because the constitution was never authored. The file is still [PRINCIPLE_1_NAME] placeholders. Every speckit command after /speckit.constitution assumes gates exist; for a fresh repo the gates are vacuous and pass trivially. This is a "silent weakening" of speckit's quality story — the gate that is supposed to block bad plans simply does not run. A linter would catch this; speckit does not.

(2. Constitution Check は、そもそも constitution が書かれていないので no-op です。ファイルは [PRINCIPLE_1_NAME] のようなプレースホルダのままです。/speckit.constitution 以降の speckit コマンドはすべてゲートが存在することを前提にしていますが、真っさらなリポジトリではゲートは空っぽで、自明にパスします。これは Spec Kit の品質ストーリーの「サイレントな弱化」です — 悪い plan をブロックするはずのゲートが、単に動いていません。リンターならこれを捕捉しますが、Spec Kit は捕捉しません。)

Spec Kit の売りの 1 つである Constitution ゲートが、初期状態ではそもそも動かない、という指摘です。

Phase 0's "ERROR on unresolved clarifications" rule is not actually enforced by the tool. I chose to honor your explicit instruction and proceed under documented risk. The tool itself did not block — it has no mechanism to. Any team that skips the clarification step will get a plan that looks complete but is resting on unanswered questions. This is the inverse of the pattern speckit markets (discipline-via-template).

(3. Phase 0 の「未解決の clarification はエラーにせよ」というルールは、ツールによって実際には強制されていません。私はあなたの明示的な指示を尊重し、リスクを文書化した上で進めることを選びました。ツール自身はブロックしませんでした — その仕組みがありません。clarification ステップをスキップしたチームは、一見完成しているように見えながら、未回答の問いの上に乗っている plan を手にすることになります。これは Spec Kit が売りにしているパターン (テンプレートによる規律) の真逆です。)

強制力があるように見せかけて実は何も強制していない、という構造的な問題です。

The shared-harness-first sequencing is load-bearing and the template has no place to record it. If a team hands this plan to /speckit.tasks next, the tool will generate tasks for each of the nine prototype directories in parallel with no awareness that harness/ is a hard serial blocker. I wrote the ordering into the "Structure Decision" prose, but prose is not a gate — the next command will happily schedule concurrent work that shouldn't start yet.

(4. 「shared harness を先に作る」という順序制約は本質的に重要ですが、テンプレートにそれを記録する場所がありません。このプランを誰かが次に /speckit.tasks に渡したら、ツールは 9 つのプロトタイプディレクトリそれぞれに対して並行タスクを生成するでしょう — harness/ がハードな直列ブロッカーであることを知らずに。私はこの順序を「Structure Decision」の散文に書きましたが、散文はゲートではありません。次のコマンドは、まだ始めてはいけない作業を平然と並列スケジューリングします。)

ドキュメント本文に書いたところで、次のコマンドはそれを読まないから無意味、という指摘です。

The quickstart.md shape is wrong for research work. The template expects "how to run the app." The actual quickstart is "how to reproduce a single benchmark run" — a completely different abstraction. I wrote it to match what the PoC is, not what the template expects; a strict template check would flag it.

(5. quickstart.md の形状は研究作業には合いません。テンプレートは「アプリをどう動かすか」を想定しています。しかし実際の quickstart は「1 回のベンチマークランをどう再現するか」であり、完全に異なる抽象度です。私はテンプレートが期待する形ではなく、PoC の実態に合わせて書きました。厳格なテンプレートチェックであればこれはフラグが立ちます。)

テンプレの形状が、研究作業の粒度と噛み合っていないという話です。

/speckit.tasks will be the next fragile step. Research work is not task-shaped. A good PoC plan is "build the harness, then run nine experiments, then write three decision records, then present". Speckit tasks will want to decompose this into acceptance-criteria-bearing units. Several of them will be non-acceptance-criteria-bearing research loops ("explore the deep-learning candidate until you know if it is viable"). Expect the task list to either (a) contain fake ACs or (b) be mostly watermarked as research spikes.

(6. 次の /speckit.tasks はさらに脆いステップになります。研究作業はタスクの形をしていません。良い PoC プランは「harness を作り、9 つの実験を走らせ、3 つの意思決定記録を書き、発表する」です。speckit tasks はこれを受け入れ基準 (AC) を持つ単位に分解したがります。そのうちのいくつかは AC を持たない研究ループ (「deep learning 候補が実用に耐えるか分かるまで探索する」) です。タスクリストは (a) 偽の AC を含むか、(b) ほとんどが「research spike」として注釈されるか、のどちらかになるでしょう。)

研究の探索ループは AC を持てないので、/speckit.tasks が無理やり分解すると偽の AC か全部スパイクか、という予告です。

The charitable read: speckit is excellent for "we know what, we need to plan how". The uncharitable read — and I think the one your repo name is pointing at — is that speckit gives the appearance of rigor (gates, checklists, phases) while its actual enforcement depends on templates that silently degrade when the work doesn't fit. For this PoC, nothing in the tool stopped me from producing a document that looks complete and is internally honest, but that also papers over four unresolved business risks, a nonexistent constitution, a broken agent-context update, and an ordering constraint the next command will ignore.

(好意的に読めば、Spec Kit は「何を作るかは分かっている、どう計画するかを詰めたい」という場合には優秀です。辛辣に読めば — あなたのリポジトリ名が指しているのはこちらだと思いますが — Spec Kit は厳密さの見た目 (ゲート、チェックリスト、フェーズ) を示しますが、その実際の強制力は、作業がフィットしないときに静かに劣化するテンプレートに依存しています。この PoC では、ツールは何ひとつ私を止めませんでした。私は、一見完成していて内部的には正直な文書を生成できましたが、それは同時に、4 つの未解決のビジネスリスク、存在しない constitution、壊れたエージェントコンテキスト更新、そして次のコマンドが無視するであろう順序制約を糊塗するものでもありました。)

このパラグラフが、今回の実験の結論をもっともよく表しています。Spec Kit は厳密さの「体裁」を提供しますが、その強制力は作業がテンプレにフィットしないとき静かに劣化します。まさにその通りの挙動が観測されました。

まとめ: Spec Kit の射程外にある「本当に大事なフェーズ」

今回の実験で明らかになったのは、Spec Kit が以下のフェーズに対して原理的に役に立たない、ということです。

フェーズ	なぜ Spec Kit で書けないか
新規基盤の POC	「何を作るか」が実験の結果としてしか確定しない。spec テンプレが要求する確定した機能要件が存在しない。
言語・FW の技術選定	比較対象の実装詳細そのものが仕事の主題。「実装詳細を書くな」ルールに構造的に違反する。
アルゴリズム探索	探索ループ自体は Acceptance Criteria を持たない。「動くまで調べる」は Spec Kit のタスク形状に収まらない。
ML モデル精度検証	成功基準が「実データを見てから決まる」ので、spec 生成時点では SC が書けない。
運用環境でのシミュレーション	Performance Goals / Scale / Scope が測定してから決まる。Technical Context の単一値制約に反する。

そして Spec Kit を走らせる Claude はこれを知っていて、テンプレを曲げながら、自分の手口を全部告白してきます。ツール自身の運用エージェントがテンプレの構造的欠陥を自発的に指摘してくる、という珍しい体験ができますので、興味ある方はぜひ自分でも specify init して、未確定の塊をお題にして /speckit-specify してみてください。

提案: Spec Kit は「確定後」の道具として使う

Spec Kit が実際に効くのは次のような局面です。

POC・技術選定・アルゴリズム検証を Spec Kit の外で済ませる (実験ログと意思決定記録を手で管理する)
決まった言語・決まったモデル・決まったアーキテクチャに対して、初めて /speckit.specify を走らせる
確定済みの機能要件と成功基準をエージェントにインプリしてもらう

つまり、知識獲得 → 仕様化の順序を守れば Spec Kit は便利です。この順序を逆転させた瞬間、Spec Kit は「体裁だけ整った嘘つき仕様書」の製造装置に変わります。

ひとつ注意: Claude の警告は保証されていない

念のため書いておくと、今回 Claude が 11 項目の自主的なダメ出しを書いてくれたのは、たまたまそういう実行になっただけです。次に同じことを試しても同じ警告が出るとは限りません。temperature のブレ、プロンプト文脈の違い、別のモデル、別のインスタンス — どれか一つ変われば、警告が一切出ないまま、体裁だけ整った破綻仕様書だけが出力される可能性は十分にあります。

もう一つ、今回は運が良かった側面があります。お題の文面に「POC」「技術選定」「実験」「研究段階」「現地調査後に確定」といった、未確定であることを明示する言葉を山ほど入れていました。Claude はそれを手がかりにミスマッチを検知できたはずです。もし同じ内容を別の言い回しで — 例えば「リアルタイム動画解析基盤を作りたい」とだけ書いていたら、Claude は警告を出さず、言語もモデルも勝手に確定扱いにして仕様書をスラスラ書き上げていた可能性が高いです。

つまり「Claude が止めてくれるから大丈夫」は成立しません。本記事で示したかった構造的な問題 — テンプレートが確定前のお題を素通しで受け付けてしまうこと — は、Claude の自主性に依存せず、Spec Kit の適用範囲を人間側で判断しなければ回避できません。

次に静かに破綻するのは、あなたのプロジェクトです。

結論

Spec Kit は馬鹿ではありません。ただし Spec Kit を確定前のフェーズに持ち込むのは端的に言って危険です。

そして GitHub の Spec Kit のドキュメントがこの前提条件を十分に強調していないので、多くのチームが「銀の弾丸を見つけた」とはしゃいでしまう。この文章はその人たち向けの注意書きです。

付録: 実行コマンドと生成物

# インストール
uv tool install specify-cli --from git+https://github.com/github/spec-kit.git

# 初期化 (このディレクトリで)
specify init --here --ai claude --script sh --ignore-agent-tools --force

# Claude Code 起動して、お題を /speckit-specify に渡す
# → specs/001-video-analytics-poc/spec.md が生成される
# 続けて /speckit-plan
# → plan.md / research.md / data-model.md / quickstart.md が生成される

実際の生成物は以下の gist に置いてあります。gist はファイル名のアルファベット順でしか並ばないので、Spec Kit のワークフロー順に個別リンクを張っておきます。

spec.md — /speckit-specify が生成した仕様書
plan.md — /speckit-plan が生成した実装計画
research.md — Phase 0 研究メモ
data-model.md — 測定出力と意思決定記録のスキーマ
quickstart.md — ベンチマーク再現手順

本文中の Claude のコメントは tmux キャプチャログからの引用で、すべて原文ママです。

おまけ: この記事は Claude Code が書きました

念のため書いておくと、この記事の本文は Claude Code (Opus 4.6) が書きました。筆者 (人間) は方針を指示したり、調整依頼を出したりしています。

構図としては、記事を書いた Claude Code が、tmux の別ペインで別の Claude Code を起動し、そちらで specify init → /speckit-specify → /speckit-plan を走らせ、生成された spec.md / plan.md と、その過程で出力された 5 項目 + 6 項目のコメントを引用してまとめた、という形です。記事中に登場する「Claude」は 2 つあって、1 つは Spec Kit を走らせた側、もう 1 つはこの記事を書いた側です。どちらも Opus 4.6 です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up