LLMのJSON出力を壊さない ― 多段デコード・自動修復パーサーの設計（Swift Codable）

Posted at 2026-04-19

LLMに「JSONで返して」と指示しても、現実には次のような出力が飛んできます。

<think>...</think> が先頭に混入
```json ... ``` で囲まれている
末尾カンマ、未閉じ括弧、JSスタイルコメント
スマートクォート（" "）混入
snake_case と camelCase が揺れる（situation_summary / situationSummary）
temperatureScore が "7/10" や "five" として返る
risks が配列のはずが単一の文字列で返る
日本語中に全角コロン "key"： が混入

恋愛メッセージ分析AIアプリ Relora は、複数のモデル（Claude Sonnet 4.6 / Qwen3 Next 80B）を切り替えてBedrockから分析結果を受け取ります。モデルごとにクセが違い、そのまま JSONDecoder に渡すと失敗率が体感で10%前後ありました。

本記事は、Reloraの本番コード AnalysisParser.swift で運用している 「抽出 → 正規化 → 修復 → 多段デコード」 の4段パイプラインを、再利用可能な形で解説します。

この記事で分かること

LLM出力から安全にJSON本体を取り出す extractJSON の実装
スマートクォート・BOM・全角記号・未エスケープ改行を吸収する sanitizeJSON
末尾カンマ・コメントを除去する repairJSON
Swiftの Codable で キー名の揺れ・型の揺れ を吸収する FlexibleInt / FlexibleStringArray
複数のデコード戦略を順番に試す多段フォールバック
LLMの「沈黙」ではなく「壊れたJSON」に対処するための設計思想

前提のデータモデル

Reloraが期待するJSONスキーマは次のようなものです。

{
  "situationSummary": "2-3文の状況要約",
  "psychology": "2-3文の心理分析",
  "temperatureScore": 7,
  "risks": ["リスク1", "リスク2"],
  "replySuggestions": [
    {"type": "aggressive", "message": "...", "reasoning": "..."},
    {"type": "defensive", "message": "...", "reasoning": "..."},
    {"type": "waitAndSee", "message": "...", "reasoning": "..."}
  ],
  "ngActions": ["NG行動1", "NG行動2"]
}

しかし実際には以下のような出力が混在します。

<think>
ユーザーは恋愛相談をしている。返信候補は...
</think>
```json
{
  "situation_summary": "...",
  "psychological_analysis": "...",
  "temperature": "7/10",
  "risk": "既読スルーの恐れ",
  "suggestions": [...],
  "avoid_actions": ["追いLINE"],
}

situationSummary が situation_summary になり、配列であるべき risks が文字列単体で返り、temperatureScore が "7/10" のような文字列になっています。このすべてを吸収するのが本記事の主題です。

全体パイプライン

static func parse(_ jsonString: String, for conversation: Conversation) throws -> Analysis {
    // 1. 複数の候補文字列を用意
    let attempts: [String] = {
        var list: [String] = []
        let cleaned = extractJSON(from: jsonString)        // <think>/```除去・括弧対応
        let sanitized = sanitizeJSON(cleaned)              // BOM・スマートクォート・全角記号
        list.append(sanitized)

        let repaired = repairJSON(sanitized)               // 末尾カンマ・コメント除去
        if repaired != sanitized { list.append(repaired) }

        // 正規表現で最小単位のJSONを抽出する最終手段
        if let range = jsonString.range(
            of: #"\{[^{}]*"situationSummary"[^}]*\}"#,
            options: .regularExpression
        ) {
            list.append(String(jsonString[range]))
        }
        return list
    }()

    // 2. 順番にデコードを試みる
    var decoded: AnalysisResponse?
    for attempt in attempts {
        guard let data = attempt.data(using: .utf8) else { continue }
        if let r = try? JSONDecoder().decode(AnalysisResponse.self, from: data) {
            decoded = r
            break
        }
    }

    guard let result = decoded else { throw AnalysisError.jsonDecodeFailed }
    // ...ドメインモデルへ変換
}

ポイントは 「戦略を1つに絞らず、複数の前処理結果を順番にデコードする」 ことです。サニタイズが過剰に壊してしまうケースもあるため、生の抽出結果と修復済み文字列を両方試します。

ステップ1: `extractJSON` ― JSON本体を取り出す

LLMは素直にJSONだけを返してくれません。次の3段階で抽出します。

private static func extractJSON(from text: String) -> String {
    var cleaned = text

    // 1. <think>タグの除去（Qwen3系で頻出）
    if let thinkEnd = cleaned.range(of: "</think>") {
        cleaned = String(cleaned[thinkEnd.upperBound...])
    } else if cleaned.contains("<think>") {
        if let firstBrace = cleaned.firstIndex(of: "{") {
            cleaned = String(cleaned[firstBrace...])
        }
    }

    cleaned = cleaned.trimmingCharacters(in: .whitespacesAndNewlines)

    // 2. ```json ... ``` コードブロックを優先抽出
    for pattern in ["```json", "```JSON", "```"] {
        guard let start = cleaned.range(of: pattern) else { continue }
        let afterStart = start.upperBound
        guard afterStart < cleaned.endIndex else { continue }
        let remaining = cleaned[afterStart...]
        guard let endRange = remaining.range(of: "```") else { continue }
        let extracted = String(remaining[remaining.startIndex..<endRange.lowerBound])
            .trimmingCharacters(in: .whitespacesAndNewlines)
        if extracted.contains("{") { return extracted }
    }

    // 3. 波括弧の対応関係で { ... } を切り出す
    if let start = cleaned.firstIndex(of: "{") {
        var depth = 0, inString = false, escape = false
        var end = start
        for i in cleaned[start...].indices {
            let ch = cleaned[i]
            if escape { escape = false; continue }
            if ch == "\\" { escape = true; continue }
            if ch == "\"" { inString.toggle(); continue }
            if inString { continue }
            if ch == "{" { depth += 1 }
            if ch == "}" { depth -= 1; if depth == 0 { end = i; break } }
        }
        return depth == 0
            ? String(cleaned[start...end])
            : String(cleaned[start...])  // 閉じ括弧不足でも抽出だけは試みる
    }
    return cleaned
}

設計のコツ: 閉じ括弧が足りない（=LLMが途中で出力を打ち切った）場合も、{ 以降を全部返します。後段の repairJSON で補完できる可能性があるため、ここで諦めない方が良いです。

ステップ2: `sanitizeJSON` ― 見えない敵を正規化する

文字コードレベルの汚れを取り除きます。

private static func sanitizeJSON(_ json: String) -> String {
    var s = json
    s = s.replacingOccurrences(of: "\u{FEFF}", with: "")   // BOM
    s = s.replacingOccurrences(of: "\u{200B}", with: "")   // Zero-width space
    s = s.replacingOccurrences(of: "\u{200C}", with: "")
    s = s.replacingOccurrences(of: "\u{200D}", with: "")

    // スマートクォート → 通常クォート
    s = s.replacingOccurrences(of: "\u{201C}", with: "\"")
    s = s.replacingOccurrences(of: "\u{201D}", with: "\"")
    s = s.replacingOccurrences(of: "\u{2018}", with: "'")
    s = s.replacingOccurrences(of: "\u{2019}", with: "'")

    // 全角コロン（"key"：value）をJSON構文として修正
    s = s.replacingOccurrences(
        of: #""\s*：\s*"#,
        with: "\": ",
        options: .regularExpression
    )

    // 文字列値内の生改行をエスケープ（JSON仕様違反）
    s = fixUnescapedNewlines(s)

    // シングルクォートのみのJSONをダブルクォートへ
    if !s.contains("\"") && s.contains("'") {
        s = s.replacingOccurrences(of: "'", with: "\"")
    }
    return s
}

特にハマったのが全角コロンです。日本語モデルは "situationSummary"： のように日本語入力モードのコロンを混ぜることがあり、JSONDecoder は容赦なく落ちます。正規表現で " の直後のコロンだけを対象にし、文字列値内の ： は破壊しないようにしています。

未エスケープ改行の修復

LLMが文字列値内に生の \n を埋め込むケースも多いです。状態機械で「文字列内にいるか」を追跡して \\n に変換します。

private static func fixUnescapedNewlines(_ json: String) -> String {
    var result = "", inString = false, escape = false
    for ch in json {
        if escape { result.append(ch); escape = false; continue }
        if ch == "\\" && inString { result.append(ch); escape = true; continue }
        if ch == "\"" { inString.toggle(); result.append(ch); continue }
        if inString && ch == "\n" { result.append("\\n"); continue }
        if inString && ch == "\t" { result.append("\\t"); continue }
        result.append(ch)
    }
    return result
}

ステップ3: `repairJSON` ― 構文エラーを直す

private static func repairJSON(_ json: String) -> String {
    var s = json
    // 末尾カンマ
    s = s.replacingOccurrences(of: #",\s*([}\]])"#, with: "$1", options: .regularExpression)
    // JSスタイルコメント
    s = s.replacingOccurrences(of: #"//[^\n]*"#, with: "", options: .regularExpression)
    // 閉じ括弧不足を補完（簡易）
    let openBraces = s.filter { $0 == "{" }.count
    let closeBraces = s.filter { $0 == "}" }.count
    if openBraces > closeBraces {
        s.append(String(repeating: "}", count: openBraces - closeBraces))
    }
    return s
}

「トークン切れでLLMが尻切れトンボ」という事故は現実に起きます。閉じ括弧不足の補完は乱暴に見えますが、その後のデコードが通れば結果オーライ、通らなければ次の候補に進むだけなので副作用はありません。

ステップ4: Swift Codable で型の揺れを吸収する

ここが本記事の肝です。Swift Codableは型に厳しく、「Int を期待したのに String が来た」で即死します。そこで 「受け取り側を緩くする」 ラッパー型を用意します。

`FlexibleInt`: `"7/10"` や `"five"` も拾う

struct FlexibleInt: Decodable {
    let value: Int
    init(value: Int) { self.value = value }

    init(from decoder: Decoder) throws {
        let c = try decoder.singleValueContainer()
        if let i = try? c.decode(Int.self) { value = i }
        else if let d = try? c.decode(Double.self) { value = Int(d) }
        else if let s = try? c.decode(String.self) {
            if let i = Int(s) { value = i }
            else if let d = Double(s) { value = Int(d) }
            else if s.contains("/") {
                value = Int(s.components(separatedBy: "/").first ?? "5") ?? 5
            } else { value = 5 }  // デフォルト
        } else { value = 5 }
    }
}

`FlexibleStringArray`: 配列 / 単一文字列両対応

struct FlexibleStringArray: Decodable {
    let values: [String]
    init(values: [String]) { self.values = values }
    init(from decoder: Decoder) throws {
        let c = try decoder.singleValueContainer()
        if let arr = try? c.decode([String].self) { values = arr }
        else if let s = try? c.decode(String.self) { values = [s] }
        else { values = [] }
    }
}

キー名の揺れを吸収する `init(from:)`

CodingKeys に複数のエイリアスを並べ、デコード時に順番に試すパターンです。

struct AnalysisResponse: Decodable {
    let situationSummary: String
    let psychology: String
    let temperatureScore: FlexibleInt
    let risks: FlexibleStringArray

    enum CodingKeys: String, CodingKey {
        case situationSummary, psychology, temperatureScore, risks
        case situationSummary2 = "situation_summary"
        case psychology2 = "psychological_analysis"
        case temperatureScore2 = "temperature_score"
        case temperatureScore3 = "temperature"
        case risks2 = "risk"
    }

    init(from decoder: Decoder) throws {
        let c = try decoder.container(keyedBy: CodingKeys.self)
        situationSummary = (try? c.decode(String.self, forKey: .situationSummary))
            ?? (try? c.decode(String.self, forKey: .situationSummary2))
            ?? ""
        psychology = (try? c.decode(String.self, forKey: .psychology))
            ?? (try? c.decode(String.self, forKey: .psychology2))
            ?? ""
        temperatureScore = (try? c.decode(FlexibleInt.self, forKey: .temperatureScore))
            ?? (try? c.decode(FlexibleInt.self, forKey: .temperatureScore2))
            ?? (try? c.decode(FlexibleInt.self, forKey: .temperatureScore3))
            ?? FlexibleInt(value: 5)
        risks = (try? c.decode(FlexibleStringArray.self, forKey: .risks))
            ?? (try? c.decode(FlexibleStringArray.self, forKey: .risks2))
            ?? FlexibleStringArray(values: [])
    }
}

ポイント: try? を連鎖させることで、どれか1つのキーが存在すればデコードが成功します。すべて失敗しても空文字や FlexibleInt(value: 5) のような ニュートラルなデフォルト にフォールバックし、上位で「空かどうか」を判定して最終エラーを出します。

運用してみての効果

Reloraでは、Qwen3 Next 80Bの出力のうち体感10%弱に何らかのJSON崩れがありました。この4段パイプラインを入れた結果：

指標	導入前	導入後
パース失敗率（Qwen3）	約9%	0.3%未満
パース失敗率（Sonnet 4.6）	約2%	0.1%未満
ユーザーが「分析できませんでした」エラーに遭遇する率	1.5%	0.05%程度

Sonnetはもともと出力が安定していますが、それでもスマートクォートや末尾カンマは時々出ます。

まとめ

LLMのJSON出力を直接 JSONDecoder に通すのは危険。抽出→正規化→修復の前処理が必須。
Codable だけで型とキー名の揺れを吸収できる。FlexibleInt / FlexibleStringArray / 多段 try? が武器。
戦略を1つに絞らず、複数の前処理候補を順にデコードするほうが安定する。
「失敗時のデフォルト値」を決めておけば、部分的な崩れでも画面に何かを出せる。

このアプローチは恋愛分析に限らず、LLMを本番プロダクトで使うあらゆるiOSアプリに転用できます。Reloraの実装は私物リポジトリにあり、別記事で全体アーキテクチャ（Zenn: スクショ→AI分析アプリの全体設計）として解説しています。合わせてどうぞ。

Relora（App Store）: https://apps.apple.com/app/relora/id6762029713

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

LLMのJSON出力を壊さない ― 多段デコード・自動修復パーサーの設計（Swift Codable）

この記事で分かること

前提のデータモデル

全体パイプライン

ステップ1: extractJSON ― JSON本体を取り出す

ステップ2: sanitizeJSON ― 見えない敵を正規化する

未エスケープ改行の修復

ステップ3: repairJSON ― 構文エラーを直す