Qwen × Chat Template徹底解説

Posted at 2025-10-10

〜LLMの会話が“どのように構築されるか”を内部処理まで理解する〜

🧭 はじめに

「LLM（大規模言語モデル）との会話をどうやって構築するの？」
「Chat Templateって何？」

そんな疑問を持つ方に向けて、この記事では
Qwenシリーズ（Qwen2-7B-Chat）を例に
LLMとの会話が内部でどう動くかを分かりやすく分解して説明します。

最後には、FastAPIでAPI化する実例や
独自テンプレートをカスタマイズする方法も紹介します。

🧩 全体構成（ざっくり理解）

Chat Templateを使ったLLMとのやり取りは、次の5ステップで構成されます👇

ステップ	処理内容	目的
①	messages構造を作る	会話履歴を構造化
②	Chat Templateを適用	モデルが理解できる形式に整形
③	トークナイズ	テキスト → トークン（数値列）へ変換
④	モデル推論	トークン列から応答を生成
⑤	デコード	トークン列を人間可読なテキストに戻す

① messages構造を作る

会話の履歴は、次のように role と content のペアを配列で表します。

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "こんにちは、あなたは誰ですか？"}
]

💡 この形式はOpenAI APIやQwen、Gemini、Claudeなどでも共通。
「system」「user」「assistant」などの役割を明示的に指定します。

② Chat Templateを適用してテキスト化

次に、このmessagesをChat Templateでテキスト化します。
QwenのテンプレートはJinja2ライクな構文で定義されています。

{% for message in messages %}
<|im_start|>{{ message['role'] }}
{{ message['content'] }}<|im_end|>
{% endfor %}

これを埋め込むと次のようなテキストになります👇

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
こんにちは、あなたは誰ですか？<|im_end|>
<|im_start|>assistant

最後の <|im_start|>assistant は、これからモデルが続きを生成する場所です。

③ トークナイズ（Tokenization）

テンプレートで生成したテキストを、モデルが扱えるトークン列に変換します。

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Chat")

prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt")

出力例：

{
  "input_ids": tensor([[151643, 1235, 48, ...]]),
  "attention_mask": tensor([[1, 1, 1, ...]])
}

④ モデル推論（出力トークン生成）

次に、モデルをロードしてトークン列を入力します。

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-7B-Chat",
    device_map="auto"
)

output = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True
)

内部的には次のような処理が行われています：

入力トークン列
   ↓
Self-Attention層で文脈解析
   ↓
「次に来るトークン」を逐次予測
   ↓
確率サンプリングで自然な系列を生成

⑤ デコード（トークン → テキスト）

モデル出力を人間が読めるテキストに変換します。

decoded = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded)

出力例：

こんにちは！私はQwenというAIアシスタントです。あなたの質問にお答えします。

🔁 ⑥ 会話を継続（コンテキスト保持）

次のターンでは、これまでの履歴を維持して続行します。

messages.append({
    "role": "assistant",
    "content": "こんにちは！私はQwenというAIアシスタントです。"
})

messages.append({
    "role": "user",
    "content": "あなたは何ができますか？"
})

再びテンプレートを適用して同様に推論すれば、
前回までの文脈を踏まえた応答が得られます。

💡 実行可能な完全サンプルコード

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Qwen/Qwen2-7B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# 1. 会話履歴を定義
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "自己紹介してください。"}
]

# 2. テンプレート適用
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# 3. トークナイズ
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# 4. 推論
outputs = model.generate(**inputs, max_new_tokens=128)

# 5. 新規生成部分のみをデコード
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))

🌐 FastAPIでAPI化してみる

Chat Templateを理解すると、APIサーバー化も簡単です。
以下は最小構成の例です。

from fastapi import FastAPI
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()

model_name = "Qwen/Qwen2-7B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

class ChatInput(BaseModel):
    messages: list[dict]

@app.post("/chat")
def chat(input: ChatInput):
    prompt = tokenizer.apply_chat_template(
        input.messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=128)
    reply = tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True)
    return {"reply": reply}

実行

uvicorn main:app --reload

呼び出し例（curl）

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"messages":[
        {"role":"system","content":"You are a helpful assistant."},
        {"role":"user","content":"PythonでAPIを作るには？"}
      ]}'

🧱 Chat Templateをカスタマイズする

Chat Templateは、独自の形式に変更することも可能です。
例えば、JSON出力を促すテンプレートを組み込むこともできます。

{% for message in messages %}
<|im_start|>{{ message['role'] }}
{{ message['content'] }}<|im_end|>
{% endfor %}
<|im_start|>assistant
出力は必ずJSON形式で返してください。

これにより、ツール呼び出し・構造化出力（例：function calling）なども設計可能になります。

🧭 全体の流れまとめ図

┌────────────────────────┐
│ ① messages配列を作成     │
│   [{role, content}, ...] │
└─────────────┬───────────┘
              │
              ▼
┌────────────────────────┐
│ ② Chat Templateを適用    │
│   → 1つのプロンプト文字列│
└─────────────┬───────────┘
              │
              ▼
┌────────────────────────┐
│ ③ Tokenizerでトークン化 │
│   → input_ids生成       │
└─────────────┬───────────┘
              │
              ▼
┌────────────────────────┐
│ ④ モデルで推論          │
│   → 応答トークン生成     │
└─────────────┬───────────┘
              │
              ▼
┌────────────────────────┐
│ ⑤ デコードして出力       │
│   → 人間可読テキスト化   │
└────────────────────────┘

✅ まとめ

ステップ	処理内容	目的
①	messages作成	会話履歴を構造化
②	Chat Template適用	LLMが理解できる形式に整形
③	トークナイズ	テキストを数値列化
④	モデル推論	応答トークン生成
⑤	デコード	応答を文字列化
⑥	messages追記	コンテキスト維持で会話継続

🧩 参考

💬 この記事で伝えたかったこと

Chat Templateは「LLMが理解できる文章構造を作るための仕組み」。
これを理解すると、APIを叩く・カスタムテンプレートを作る・ツール連携する、
といった高度なLLM開発の基礎が一気に見えてきます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up