@yukikoblog8376(yukiko@一流のIT研修講師を目指し学習中)

AIメンターちゃんに教わった、個人用推しのプロンプトエンジニアリングマスターガイド（エンジニア向け）Google編

Posted at 2026-04-06

🗺️ このガイドの構成

0. プロンプトエンジニアリングとは
    ↓
1. プロンプトの構造（4要素）
    ↓
2. モデルパラメータの使い方
    ↓
3. プロンプト技術カタログ（12手法）
    ↓
4. ユースケース別プロンプトパターン
    ↓
5. ベストプラクティス
    ↓
6. プロンプト管理・バージョン管理
    ↓
7. 評価・改善サイクル

📚 目次

#	セクション	キーワード
0	プロンプトエンジニアリングとは	定義・なぜ重要か
1	プロンプトの4要素構造	Role / Task / Context / Format
2	モデルパラメータ	Temperature / Top-P / Top-K
3	技術カタログ	Zero-shot〜ReAct・12手法
4	ユースケース別パターン	コード生成・要約・QA・RAG
5	ベストプラクティス	Google公式推奨事項
6	プロンプト管理	バージョン管理・チーム共有
7	評価・改善サイクル	テスト・メトリクス・反復

0. プロンプトエンジニアリングとは

出典： Google Cloud 公式定義（Updated: Jan 14, 2026）

定義

Prompt engineering is the art and science of designing and optimizing
prompts to guide AI models, particularly LLMs, towards generating
the desired responses.
— Google Cloud 公式ドキュメント

【日本語訳】
プロンプトエンジニアリングとは、AIモデル（特にLLM）が望ましい応答を
生成するよう導くために、プロンプトを設計・最適化するアートと科学である。

なぜ重要か

観点	内容
出力品質	プロンプトの質がそのままAIアウトプットの質を決定する
再現性	同じプロンプトで同じ品質の出力を安定して得られる
コスト効率	Fine-tuningより低コストでモデルを特定タスクに特化させられる
安全性	ガードレールを設定してハルシネーション・有害出力を減らす
スケール	テンプレート化・変数化で大量タスクを効率処理できる

エンジニアが押さえるべき核心

【The effectiveness of your prompt directly influences
  the quality and relevance of the AI's output.】
（Google Cloud 公式より）

→ プロンプトの出来が、そのままAI出力の品質・関連性を左右する。
  コードの設計と同様に、プロンプトの設計も「エンジニアリング」として扱う。

1. プロンプトの4要素構造

出典： Google Prompt Engineering Whitepaper 2025 / Google Workspace Prompting Guide 101

基本構造：RTCF フレームワーク

┌─────────────────────────────────────────┐
│  R：Role（役割）                          │
│  T：Task（タスク）                        │
│  C：Context（コンテキスト）               │
│  F：Format（フォーマット）                │
└─────────────────────────────────────────┘

要素	説明	例
Role	AIに担わせるペルソナ・役割	"You are a senior Python engineer..."
Task	具体的にやってほしいこと	"Review the following code and..."
Context	背景・制約・前提情報	"The codebase uses Python 3.11, FastAPI..."
Format	出力形式・長さ・構造	"Return as JSON with keys: issue, severity, fix"

RTCF プロンプトテンプレート（コピペ用）

# Role（役割）
You are a [役割・専門家像] with expertise in [専門分野].

# Task（タスク）
Your task is to [具体的なアクション動詞] the following [対象].
[必要に応じて: Think step by step.]

# Context（コンテキスト）
Background:
- [前提条件1]
- [前提条件2]
- [制約・注意点]

Input:
{input_variable}

# Format（出力形式）
Respond in the following format:
- [出力項目1]: ...
- [出力項目2]: ...
Language: Japanese / English
Length: [〇〇 words / sentences / bullet points]

実装例（コードレビュー）

# Python実装例：プロンプトテンプレートの変数化
SYSTEM_PROMPT = """
You are a senior software engineer specializing in Python and security.
Review code strictly for bugs, security issues, and performance problems.
"""

USER_PROMPT_TEMPLATE = """
# Task
Review the following Python code. Think step by step.

# Code
```python
{code}

Context

Project: {project_name}
Python version: {python_version}
Focus area: {focus_area}

Output Format

Return a JSON object with this structure:
{{
"issues": [
{{
"line": ,
"severity": "critical|high|medium|low",
"category": "security|bug|performance|style",
"description": "",
"fix": ""
}}
],
"summary": "",
"score": <1-10>
}}
"""

def build_review_prompt(code: str, project: str, py_ver: str, focus: str) -> str:
return USER_PROMPT_TEMPLATE.format(
code=code,
project_name=project,
python_version=py_ver,
focus_area=focus
)


---
---

# 2. モデルパラメータの使い方

> **出典：** Google Prompt Engineering Whitepaper 2025（pp. 〇〇）

---

## 3大パラメータの関係

【出力の「ランダム性・創造性」を制御する3つの軸】

Temperature ──── 確率分布全体をフラット/シャープにする
Top-P ──── 累積確率でトークン候補を絞る
Top-K ──── 上位K個のトークンだけから選ぶ

低値 → 決定論的・収束した出力
高値 → ランダム・多様・創造的な出力


## パラメータ詳細

### Temperature

| 値 | 挙動 | 用途 |
|----|------|------|
| `0` | 完全に決定論的（貪欲デコーディング） | 単一正解が必要なタスク |
| `0.1〜0.3` | 安定・予測可能 | コード生成・データ抽出 |
| `0.5〜0.7` | バランス型 | 要約・翻訳・QA |
| `0.9〜1.2` | 多様・創造的 | ブレインストーミング・文章生成 |

### Top-P（Nucleus Sampling）

累積確率 P を超えるまでのトークン集合から選択する。

例：P=0.9 の場合
→ 確率の高いトークンを累積して90%になるまで集め、その中からサンプリング
→ P を下げる = 候補が絞られる = より決定論的

Google推奨デフォルト：P = 0.95


### Top-K

確率上位 K 個のトークンからサンプリングする。
※ OpenAI APIはTop-K非対応。Google Gemini / Vertex AIは対応。

K=10 → 候補が10個 → 安定・収束した出力
K=40 → 候補が40個 → 多様・創造的な出力

Google推奨デフォルト：K = 30


## Google公式 推奨プリセット

| ユースケース | Temperature | Top-P | Top-K | 備考 |
|------------|-------------|-------|-------|------|
| **バランス（デフォルト）** | 0.2 | 0.95 | 30 | 汎用 |
| **コード生成** | 0〜0.2 | 0.9 | 20 | 決定論的 |
| **要約・翻訳** | 0.3〜0.5 | 0.95 | 30 | 安定 |
| **高創造性** | 0.9〜1.2 | 0.99 | 40 | ブレスト等 |
| **単一正解QA** | 0 | — | — | 貪欲デコーディング |

## 実装例（Vertex AI / OpenAI）

```python
# Vertex AI（Gemini）
import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig

model = GenerativeModel("gemini-1.5-pro")
config = GenerationConfig(
    temperature=0.2,
    top_p=0.95,
    top_k=30,
    max_output_tokens=1024,
)
response = model.generate_content(prompt, generation_config=config)

# OpenAI（Top-K非対応）
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.2,
    top_p=0.95,
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}]
)

3. プロンプト技術カタログ（12手法）

出典： Google Prompt Engineering Whitepaper 2025

手法一覧

#	手法名	難易度	主な用途	Google推奨度
1	Zero-Shot	⭐	シンプルなタスク全般	★★★★★
2	One-Shot	⭐⭐	フォーマット指定	★★★★★
3	Few-Shot	⭐⭐	パターン学習・分類	★★★★★
4	System Prompting	⭐⭐	ロール固定・安全ガード	★★★★★
5	Role Prompting	⭐⭐	専門的な回答スタイル	★★★★
6	Contextual Prompting	⭐⭐	ドメイン固有タスク	★★★★
7	Chain-of-Thought（CoT）	⭐⭐⭐	複雑推論・計算	★★★★★
8	Step-Back Prompting	⭐⭐⭐	抽象化が必要なタスク	★★★★
9	Self-Consistency	⭐⭐⭐	高精度・ハルシネーション低減	★★★★
10	Tree of Thoughts（ToT）	⭐⭐⭐⭐	複数解探索・計画立案	★★★
11	ReAct	⭐⭐⭐⭐	ツール利用・エージェント	★★★★
12	Automatic Prompt Eng.	⭐⭐⭐⭐	プロンプト自動最適化	★★★

手法詳細

1. Zero-Shot Prompting

【概要】
例を一切与えず、タスクだけを直接指示する。
モデルがすでに学習済みの知識・能力に依存する。

【使いどき】
・モデルが十分に理解しているシンプルなタスク
・プロトタイプ・初回テスト

【テンプレート】
Classify the sentiment of the following text as Positive, Negative, or Neutral.

Text: "{text}"
Sentiment:

【Google公式の注意点】
"Zero-shot prompting works best for straightforward tasks
 where the model already understands the domain."

2. One-Shot Prompting

【概要】
1つの入力→出力例を示してからタスクを指示する。
フォーマット・トーン・スタイルを例で伝える。

【テンプレート】
Convert the following customer feedback to a structured report.

Example:
Input: "The product is great but shipping took too long."
Output:
  - Sentiment: Mixed
  - Positive: Product quality
  - Negative: Shipping speed
  - Priority: Medium

Now process this:
Input: "{customer_feedback}"
Output:

3. Few-Shot Prompting

【概要】
3〜5件の入出力例を示してパターンを学習させる。
分類・変換・特定フォーマットの出力に特に有効。

【Google公式のベストプラクティス】
"Provide high-quality examples. Including edge cases can boost robustness,
 but you also run the risk of the model overfitting to examples."

【テンプレート】
Classify SQL query complexity:

Input: "SELECT * FROM users WHERE id = 1"
Complexity: Low

Input: "SELECT u.name, COUNT(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id HAVING COUNT(o.id) > 5"
Complexity: High

Input: "SELECT name FROM products WHERE price < 1000"
Complexity: Low

Now classify:
Input: "{sql_query}"
Complexity:

【実装tips】
・例は多様にする（同パターンの繰り返しは避ける）
・エッジケースを1〜2件入れると精度が上がる
・例の品質がそのまま出力の品質を決める

4. System Prompting

【概要】
システムメッセージでモデルの振る舞い・制約・ロールを設定する。
全レスポンスに適用される「グローバルルール」として機能する。

【Google公式定義】
"Give high-level, global instructions about the model's role or output format.
 Use it to enforce consistency, structure, or safety guardrails."

【テンプレート（OpenAI / Vertex AI）】
system: |
  You are an expert API documentation writer.
  Rules:
  - Always respond in valid Markdown
  - Include code examples in every response
  - Never include information you are not confident about
  - If unclear, ask a clarifying question before answering
  - Language: Japanese unless the user writes in English

【注意点】
・System promptにセキュリティ制約を集約する
・モデルごとにSystem promptの解釈が異なる場合あり
・本番環境では必ずSystem promptをバージョン管理する

5. Role Prompting

【概要】
AIに特定のペルソナを付与し、そのキャラクターとして応答させる。
トーン・語彙・観点がロールに合わせて変化する。

【テンプレート】
You are a principal engineer at a top-tier tech company with 15 years of
experience in distributed systems. You are known for being direct,
pragmatic, and security-conscious.

Review the following architecture proposal and provide your honest assessment:
{architecture_description}

【実装アドバイス】
・ロールは具体的に（"senior engineer" より "principal engineer at Netflix"）
・ロールに矛盾する指示を与えない（混乱が生じる）
・感情・スタイルを含めると挙動が安定する

6. Contextual Prompting

【概要】
タスク固有の背景情報・ドメイン知識を埋め込む。
RAGでの検索結果注入もこの手法に含まれる。

【テンプレート】
Context:
{retrieved_documents}

Based only on the context above, answer the following question.
If the answer is not in the context, say "I don't know."

Question: {user_question}
Answer:

【RAG実装パターン】
1. ユーザー質問をEmbeddingでベクトル化
2. ベクトルDBで類似ドキュメントを検索
3. 検索結果をContextとしてプロンプトに注入
4. "Based only on the context above" で幻覚を防ぐ

7. Chain-of-Thought（CoT）

【概要】
「ステップバイステップで考えよ」と指示して中間推論を生成させる。
複雑な推論・計算・多段階判断に特に有効。

【Google公式の注意点】
"Especially useful for non-reasoning models,
 but often unnecessary for reasoning-tuned ones (o1, o3等)."

【テンプレート（Zero-shot CoT）】
Analyze the following code for potential bugs.
Think step by step before giving your final answer.

Code:
{code}

Step-by-step analysis:

【テンプレート（Few-shot CoT）】
Q: A train travels 120km in 2 hours, then 80km in 1.5 hours. What is the average speed?
A: Let me think step by step.
   - Total distance: 120 + 80 = 200km
   - Total time: 2 + 1.5 = 3.5 hours
   - Average speed: 200 / 3.5 ≈ 57.1 km/h
   Answer: 57.1 km/h

Q: {question}
A: Let me think step by step.

【ベストプラクティス（Google公式）】
"Keep your 'Let's think step by step…' prompts simple."
・プロンプト自体はシンプルに保つ
・中間ステップが多いほどトークン消費が増える
・Reasoning model（o1/o3）ではCoTは不要なことが多い

8. Step-Back Prompting

【概要】
具体的なタスクを解く前に、まず「一般原則・抽象的な知識」を
問い、その答えをメインプロンプトに活用する2段階アプローチ。

【Google公式説明】
"First ask a broad question to surface relevant background knowledge,
 then feed its answer into the main task prompt for more robust outputs."

【2段階実装パターン】
# Step 1：一般化プロンプト
What are the general principles of {topic}?
List the key concepts that experts in this domain consider.

# Step 2：具体タスクプロンプト（Step 1の回答を注入）
Using the following principles as background:
{step1_answer}

Now apply these principles to solve the specific problem:
{specific_task}

【1プロンプトで実装する場合】
Before solving the specific task, first recall the general principles
of {topic}. Then apply those principles to address:
{specific_task}

9. Self-Consistency

【概要】
同じプロンプトを複数回実行（高Temperature）して、
最頻出の回答を「多数決」で選ぶ。
ハルシネーション低減・信頼性向上に有効。

【Google公式説明】
"Run a prompt multiple times under high-temperature sampling and
 select the most frequent final answer. This majority-vote approach
 reduces hallucinations and increases reliability."

【実装パターン（Python）】
import re
from collections import Counter
from openai import OpenAI

def self_consistency(prompt: str, n: int = 5, temp: float = 0.8) -> str:
    client = OpenAI()
    answers = []
    for _ in range(n):
        resp = client.chat.completions.create(
            model="gpt-4o",
            temperature=temp,
            messages=[{"role": "user", "content": prompt}]
        )
        answers.append(resp.choices[0].message.content)

    # 最頻出の回答を返す（数値タスクの場合）
    counter = Counter(answers)
    return counter.most_common(1)[0][0]

result = self_consistency(
    prompt="Is this code vulnerable to SQL injection? Answer Yes or No.\n{code}",
    n=5
)

【注意点】
・APIコスト = 通常の n 倍になる
・分類・判定タスクで特に効果的
・自由記述タスクでの多数決は難しい（クラスタリングが必要）

10. Tree of Thoughts（ToT）

【概要】
複数の推論経路（ブランチ）を同時に探索し、
最も優れた思考の「木」を構築して解を導く。

【概念図】
問題
 ├── 思考A → 評価 → 展開 → 解A
 ├── 思考B → 評価 → 枝刈り（不採用）
 └── 思考C → 評価 → 展開 → 解C（採用）

【テンプレート（シンプル版）】
You are tasked with {task}.

Generate 3 different high-level approaches to solve this problem.
For each approach:
1. Describe the approach briefly
2. List pros and cons
3. Rate its feasibility (1-10)

After evaluating all approaches, recommend the best one and explain why.

【用途】
・アーキテクチャ設計の複数案比較
・複雑なバグの根本原因分析
・長期プロジェクトの計画立案

11. ReAct（Reason & Act）

【概要】
推論（Reasoning）と行動（Acting）をループさせる。
外部ツール（検索・コード実行・API）を使いながら
思考→行動→観察→思考 のサイクルで問題を解く。

【Google公式説明】
"Combine natural-language reasoning with external tools
 (search, code execution, etc.) in a thought–action loop."

【ReActループの構造】
Thought: まず何をすべきか考える
Action: ツールを呼び出す（search / code / api）
Observation: ツールの結果を受け取る
Thought: 結果を踏まえて次を考える
Action: 次のアクション
... （繰り返し）
Answer: 最終回答

【LangChain実装例（概略）】
from langchain.agents import create_react_agent
from langchain import hub

# ReAct プロンプトテンプレート（LangChainHub）
prompt = hub.pull("hwchase17/react")

agent = create_react_agent(
    llm=llm,
    tools=[search_tool, code_interpreter],
    prompt=prompt
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = agent_executor.invoke({"input": "〇〇を調べてコードを書いてください"})

【用途】
・情報検索 + コード生成の組み合わせ
・マルチステップのタスク自動化
・AIエージェント開発

12. Automatic Prompt Engineering（Meta-Prompting）

【概要】
モデル自身にプロンプトを生成・評価・改善させる手法。
人間が良いプロンプトを書く代わりに、AIを使う。

【Google公式説明】
"Prompt the model to generate a set of candidate prompts,
 evaluate them and select the best one."

【パターン1：プロンプト生成】
You are a prompt engineering expert.
Generate 5 different system prompts for the following use case:
Use case: {usecase}
Requirements: {requirements}

For each prompt, explain:
- What strategy it uses
- Why it would be effective
- Potential weaknesses

【パターン2：プロンプト改善】
Here is an AI prompt:
---
{existing_prompt}
---

Analyze its weaknesses and rewrite it to be more effective.
Explain what you changed and why.

【パターン3：自動評価】
Rate the following prompt on a scale of 1-10 for each criterion:
- Clarity: Is the instruction unambiguous?
- Completeness: Does it provide necessary context?
- Format: Is output format specified?
- Safety: Are guardrails included?

Prompt:
{prompt_to_evaluate}

4. ユースケース別プロンプトパターン

出典： Google Cloud 公式 + Google Prompt Engineering Whitepaper 2025

4-1. コード生成・レビュー

# コード生成テンプレート
CODEGEN_PROMPT = """
You are a senior {language} engineer. Generate production-quality code.

Task: {task_description}

Requirements:
- Language: {language} {version}
- Style: {style_guide} (e.g., PEP8, Google Style)
- Include: type hints, docstrings, error handling
- Avoid: global variables, magic numbers

Return:
1. Complete, runnable code
2. Brief explanation of key design decisions
3. Example usage
"""

# コードデバッグテンプレート
DEBUG_PROMPT = """
You are debugging a {language} application.

Error:
{error_message}

Code:
```{language}
{code}

Stack trace:
{stack_trace}

Analyze step by step:

What is the root cause?
Why does it occur?
How to fix it?
How to prevent it in the future?
"""

コード最適化テンプレート

OPTIMIZE_PROMPT = """
Optimize the following {language} code for {optimization_goal}.
Goals: {optimization_goal} (e.g., speed / memory / readability)

Original code:

{code}

Provide:

Optimized code
Explanation of changes
Expected improvement (quantify if possible)
Trade-offs introduced
"""


---

## 4-2. ドキュメント・要約

```python
# 技術ドキュメント生成
DOC_PROMPT = """
Generate technical documentation for the following code.

Code:
```{language}
{code}

Include:

Overview (2-3 sentences)
Parameters table (name, type, description, required)
Return value description
Exceptions raised
Usage example
Notes/warnings (if any)

Format: Markdown
Audience: Junior to mid-level engineers


# 会議要約
MEETING_SUMMARY_PROMPT = """
Summarize the following meeting transcript.

Transcript:
{transcript}

Output format:
## Summary
[2-3 sentence overview]

## Key Decisions
- [decision 1]
- [decision 2]

## Action Items
| Who | What | By When |
|-----|------|---------|
| ... | ...  | ...     |

## Open Questions
- [unresolved question]
"""

4-3. RAG（Retrieval-Augmented Generation）

# RAG用プロンプトテンプレート（ハルシネーション防止版）
RAG_PROMPT_TEMPLATE = """
You are a helpful assistant for {company_name}.
Answer questions based ONLY on the provided context.

Rules:
- If the answer is not in the context, say exactly:
  "I don't have enough information to answer this question."
- Do not use knowledge outside the provided context
- Cite the source document when possible
- Be concise and accurate

Context:
{retrieved_context}

Question: {user_question}

Answer:"""

# RAG品質向上のためのHyDE（仮説ドキュメント埋め込み）
HYDE_PROMPT = """
Write a hypothetical document that would perfectly answer the following question.
The document should be detailed and technical.

Question: {question}

Hypothetical document:"""
# → このHyDE出力をEmbeddingしてベクトル検索に使う

4-4. 分類・構造化データ抽出

# 分類プロンプト（Few-shot）
CLASSIFY_PROMPT = """
Classify the following customer support tickets by category and priority.

Categories: Technical / Billing / Feature Request / General
Priority: P1 (urgent) / P2 (high) / P3 (medium) / P4 (low)

Examples:
Ticket: "App crashes when I upload a file larger than 10MB"
Category: Technical, Priority: P1

Ticket: "Can you add dark mode?"
Category: Feature Request, Priority: P4

Ticket: "I was charged twice this month"
Category: Billing, Priority: P1

Now classify:
Ticket: "{ticket_text}"
Category:, Priority:

Return as JSON: {{"category": "...", "priority": "...", "reason": "..."}}
"""

# 構造化データ抽出
EXTRACT_PROMPT = """
Extract the following information from the text below.
Return ONLY valid JSON. Do not include explanations.

Fields to extract:
- name: string
- email: string or null
- phone: string or null
- company: string or null
- issue_type: string
- urgency: "low"|"medium"|"high"

Text:
{input_text}

JSON:"""

5. ベストプラクティス

出典： Google Prompt Engineering Whitepaper 2025（Lee Boonstra）

Google公式推奨 12のベストプラクティス

#	ベストプラクティス	詳細
1	高品質なExampleを提供する	Few-shotのExampleは量より質。エッジケースを含めると精度UP
2	シンプルから始める	複雑にする前に最小プロンプトで試す
3	出力を具体的に指定する	長さ・形式・構造を明示（「3つの箇条書きで」等）
4	肯定的な指示を使う	「〇〇するな」より「〇〇する」の方が効果的
5	変数（プレースホルダー）を使う	動的値は `{variable}` で抽象化してテンプレート化
6	出力フォーマットを指定する	JSON / CSV / Markdown など構造化出力を積極指定
7	継続的にテストする	モデル更新・API変更のたびに再テストする
8	max_tokensを忘れない	出力長を制御するにはmax_tokens/max_output_tokensを設定
9	入力フォーマットも試す	テーブル・JSON・箇条書きなど入力形式を変えると結果が変わる
10	チームでコラボする	セカンドオピニオンで見落としを減らす
11	CoTはシンプルに	「Think step by step」だけで十分なことが多い
12	バージョン管理する	プロンプトの変更・設定・評価結果を記録し追跡する

よくある失敗パターンと対策

❌ 失敗パターン	原因	✅ 対策
曖昧な指示（「うまくやって」）	ゴールが不明確	具体的な動詞 + 期待する出力を明示
否定形の多用（「〇〇するな」）	モデルが否定を無視しやすい	「〇〇する」に言い換える
プロンプトが長すぎる	重要指示が埋もれる	重要な指示は冒頭か末尾に置く
Exampleが偏っている	モデルがOverfitする	多様なExampleを用意する
温度設定の固定	タスクに不適切な値	タスクごとに最適値を実験する
System promptなし	一貫性のない出力	必ずSystem promptでロール・制約を設定
テストせずに本番投入	想定外の挙動が出る	ステージング環境でテストセットを使って評価

6. プロンプト管理・バージョン管理

プロンプト管理ファイル構成

project/
├── prompts/
│   ├── system/
│   │   ├── system_prompt_v1.0.txt
│   │   ├── system_prompt_v1.1.txt    ← バージョン管理
│   │   └── system_prompt_current.txt ← 現在の本番版
│   ├── tasks/
│   │   ├── code_review.txt
│   │   ├── summarize.txt
│   │   └── classify.txt
│   └── few_shot/
│       └── examples_code_review.json
├── tests/
│   ├── test_prompts.py
│   └── test_cases/
│       └── code_review_cases.json
└── docs/
    └── prompt_changelog.md

プロンプト管理台帳（コピペ用）

# プロンプト管理台帳

## PROMPT-001：コードレビュー

| 項目 | 内容 |
|------|------|
| ID | PROMPT-001 |
| 用途 | Pythonコードの自動レビュー |
| 対象モデル | GPT-4o / gemini-1.5-pro |
| バージョン | v1.2 |
| Temperature | 0.1 |
| Max Tokens | 2048 |
| 作成者 | 氏名 |
| 作成日 | YYYY-MM-DD |
| 最終更新 | YYYY-MM-DD |
| ステータス | 本番稼働中 |
| 評価スコア | 92% (n=50) |

### システムプロンプト（v1.2）
[プロンプト全文をここに貼り付ける]

### 変更履歴
| バージョン | 変更日 | 変更内容 | 精度変化 |
|----------|--------|---------|---------|
| v1.0 | YYYY-MM-DD | 初版 | 78% |
| v1.1 | YYYY-MM-DD | Few-shot追加 | +9% → 87% |
| v1.2 | YYYY-MM-DD | Output形式をJSON化 | +5% → 92% |

プロンプトのバージョン管理（Git）

# プロンプトをGitで管理する
git add prompts/
git commit -m "feat(prompts): add few-shot examples to code_review v1.1

- Added 3 new few-shot examples for edge cases
- Fixed JSON output format specification
- Accuracy improved: 78% → 87% on test set (n=50)"

# タグで本番バージョンを管理
git tag -a "prompt-code-review-v1.1" -m "Production: code review prompt v1.1"

7. 評価・改善サイクル

プロンプト評価サイクル

1. テストセット作成（Ground Truth付き）
    ↓
2. プロンプト実行（n=30〜100件以上）
    ↓
3. メトリクス計算（精度・F1・BLEU等）
    ↓
4. エラー分析（失敗パターンの分類）
    ↓
5. 仮説立案（「〇〇を変えれば精度が上がるはず」）
    ↓
6. プロンプト修正（1つずつ変更）
    ↓
7. A/Bテスト（旧 vs 新）
    ↓
8. 改善確認 → バージョンアップ
    ↓
  ループ

評価テストケーステンプレート

{
  "test_cases": [
    {
      "id": "TC-001",
      "input": {
        "code": "def calc(a, b):\n    return a/b"
      },
      "expected_output": {
        "issues": [
          {
            "severity": "high",
            "category": "bug",
            "description": "Division by zero when b=0"
          }
        ]
      },
      "tags": ["edge_case", "zero_division"],
      "notes": "基本的なゼロ除算の検出テスト"
    }
  ]
}

評価メトリクス

タスク種別	主要メトリクス	測定方法
分類	Accuracy / F1 / Precision / Recall	Ground Truthと比較
生成（要約等）	ROUGE-1 / ROUGE-2 / ROUGE-L	参照要約と比較
コード生成	Execution Success Rate / Pass@k	ユニットテスト実行
QA（RAG）	Faithfulness / Answer Relevancy	RAGASフレームワーク
自由記述	LLM-as-Judge（GPT-4o等で評価）	評価用プロンプトで採点

実験ログテンプレート（コピペ用）

# プロンプト実験ログ

## 実験情報
| 項目 | 内容 |
|------|------|
| 実験ID | EXP-〇〇〇 |
| 対象プロンプト | PROMPT-001 |
| 仮説 | Few-shotにエッジケースを追加すると精度が上がるはず |
| 実施日 | YYYY-MM-DD |
| 担当者 | 氏名 |
| テストセット件数 | 〇〇件 |

## 比較

| バージョン | 変更点 | Accuracy | F1 | レイテンシ | コスト/req |
|----------|--------|---------|-----|----------|-----------|
| v1.0（ベースライン）| — | 78% | 0.76 | 2.1s | $0.003 |
| v1.1（実験） | エッジケース3件追加 | 87% | 0.85 | 2.3s | $0.004 |

## 結果・考察
- 精度が +9% 改善（78% → 87%）
- エラーケースの大半が「境界値の見落とし」だった
- コストは +33% 増だがROI的に許容範囲

## 採用判断
✅ 採用 → v1.1を本番に昇格させる

## 次のアクション
- さらに5件のエッジケースを追加してv1.2を試す
- JSON出力フォーマットのバリデーションを追加する

📌 参考文献・公式リンク

リソース URL 更新日

Google Prompt Engineering Whitepaper（68p） https://www.kaggle.com/whitepaper-prompt-engineering 2025年2月

Google Cloud: What is Prompt Engineering https://cloud.google.com/discover/what-is-prompt-engineering Jan 14, 2026

Google Vertex AI: Introduction to Prompt Design https://cloud.google.com/vertex-ai/generative-ai/docs/learn/introduction-prompt-design —

PromptHub: Google's Best Practices 解説 https://www.prompthub.us/blog/googles-prompt-engineering-best-practices Apr 30, 2025

リソース	URL	更新日
Google Prompt Engineering Whitepaper（68p）	https://www.kaggle.com/whitepaper-prompt-engineering	2025年2月
Google Cloud: What is Prompt Engineering	https://cloud.google.com/discover/what-is-prompt-engineering	Jan 14, 2026
Google Vertex AI: Introduction to Prompt Design	https://cloud.google.com/vertex-ai/generative-ai/docs/learn/introduction-prompt-design	—
PromptHub: Google's Best Practices 解説	https://www.prompthub.us/blog/googles-prompt-engineering-best-practices	Apr 30, 2025

更新履歴

日付バージョン変更内容担当者

YYYY-MM-DD v1.0 初版作成（Google公式2025年版ベース）氏名

日付	バージョン	変更内容	担当者
YYYY-MM-DD	v1.0	初版作成（Google公式2025年版ベース）	氏名

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

AIメンターちゃんに教わった、個人用推しのプロンプトエンジニアリング マスターガイド（エンジニア向け）Google編