boto3（高2）「ごめん……///待った……？？♡♡♡♡」←こいつのリトライが邪魔でモデルフォールバックできなかった話

Last updated at 2026-03-14Posted at 2026-03-14

この記事は私の経験とその際の調査結果をもとにAIが大枠を作成、人間による校正およびコードの動作確認をしています。

今では私がおじいさん、孫にあげるのはもちろんヴェルタースオリジナル。
なぜなら、彼もまた特別な存在だからです。

こんにちは、はしもと（仮名）です。

Amazon Bedrockのオンデマンド推論で、TPMクォータを超えるとThrottlingExceptionが飛んできます。クォータはアカウント全体で共有なので、TERUさんのリクエストでTPMを使い切ったらHISASHIさんもJIROさんも弾かれます。このあと使いたいTAKUROさんも。

Bedrockにはクロスリージョン推論があり、jp.anthropic.claude-sonnet-4-6 のような推論プロファイルIDで複数リージョンに自動ルーティングされます。この記事で扱うのは、それでもなおクォータに引っかかったときのモデル間フォールバックの話です。

ということで「エラーが出たら次のモデルに切り替える」フォールバックを実装しようとしました。
やってみたらboto3のデフォルトリトライがとても悪でした。この記事はその検証記録です。

検証環境・方法

項目	バージョン
Python	3.13.12
boto3 / botocore	1.42.68
strands-agents	1.30.0

botocoreのURLLib3Session.send()をモンキーパッチしてHTTP 429を返すモックに差し替え、リトライ回数・待機時間を実測しています。

検証コード（デフォルトリトライの実測）

import json
import os
import statistics
import time
from collections import Counter
from unittest.mock import MagicMock

# AWS認証情報がない環境でも動くようにダミーを設定
os.environ.setdefault("AWS_ACCESS_KEY_ID", "testing")
os.environ.setdefault("AWS_SECRET_ACCESS_KEY", "testing")
os.environ.setdefault("AWS_DEFAULT_REGION", "ap-northeast-1")

import boto3
import botocore.endpoint
from botocore.exceptions import ClientError

RUNS = 3  # 各パターンの実行回数

# --- モック設定 ---
_original_send = botocore.endpoint.URLLib3Session.send
_call_count = 0
_call_timestamps: list[float] = []
_call_models: list[str] = []


def _mock_send(self, request, *args, **kwargs):
    """全リクエストに対して ThrottlingException (HTTP 429) を返すモック"""
    global _call_count
    _call_count += 1
    _call_timestamps.append(time.monotonic())
    model_name = request.url.split("/model/")[-1].split("/")[0] if "/model/" in request.url else "unknown"
    _call_models.append(model_name)

    mock_response = MagicMock()
    mock_response.status_code = 429
    mock_response.headers = {
        "x-amzn-ErrorType": "ThrottlingException",
        "content-type": "application/json",
    }
    mock_response.content = json.dumps(
        {
            "message": "Too many requests, please wait before trying again.",
            "__type": "ThrottlingException",
        }
    ).encode()
    mock_response.text = mock_response.content.decode()
    return mock_response


botocore.endpoint.URLLib3Session.send = _mock_send


def run_benchmark(label: str, client, model_id: str, runs: int = RUNS):
    """指定クライアントで複数回実行し、統計情報を出力する"""
    global _call_count, _call_timestamps, _call_models

    print(f"\n{'=' * 60}")
    print(f"{label}")
    print(f"{'=' * 60}")

    elapsed_times: list[float] = []
    first_models = []

    for run in range(runs):
        _call_count = 0
        _call_timestamps.clear()
        _call_models.clear()
        start_time = time.monotonic()

        try:
            client.converse(
                modelId=model_id,
                messages=[{"role": "user", "content": [{"text": "hello"}]}],
                inferenceConfig={"maxTokens": 10},
            )
        except ClientError:
            pass

        elapsed = time.monotonic() - start_time
        elapsed_times.append(elapsed)
        if run == 0:
            first_models = list(_call_models)
            first_gaps = [
                _call_timestamps[i] - _call_timestamps[i - 1]
                for i in range(1, len(_call_timestamps))
            ]

    print(f"\n試行回数: {len(first_models)}回")
    model_counts = Counter(first_models)
    print(f"呼び出しモデル:")
    for model, count in model_counts.items():
        print(f"  {model}: {count}回")
    print(f"所要時間: 平均 {statistics.mean(elapsed_times):.1f}秒")
    if first_gaps:
        print(f"各試行間の待機時間:")
        for i, gap in enumerate(first_gaps):
            print(f"  試行{i + 1} → 試行{i + 2}: {gap:.1f}秒")


MODEL_ID = "jp.anthropic.claude-sonnet-4-6"

# 検証1: デフォルト設定（legacyモード）
bedrock_default = boto3.client("bedrock-runtime", region_name="ap-northeast-1")
run_benchmark("デフォルト設定（legacyモード、max_attempts=5）", bedrock_default, MODEL_ID)

# 検証2: standardモード
config_standard = botocore.config.Config(retries={"max_attempts": 2, "mode": "standard"})
bedrock_standard = boto3.client("bedrock-runtime", region_name="ap-northeast-1", config=config_standard)
run_benchmark("standardモード（max_attempts=2 = 合計3回）", bedrock_standard, MODEL_ID)

# 検証3: リトライ無効化
config_no_retry = botocore.config.Config(retries={"total_max_attempts": 1, "mode": "standard"})
bedrock_no_retry = boto3.client("bedrock-runtime", region_name="ap-northeast-1", config=config_no_retry)
run_benchmark("リトライ無効化（total_max_attempts=1）", bedrock_no_retry, MODEL_ID)

# モック解除
botocore.endpoint.URLLib3Session.send = _original_send

何が起きたか

ThrottlingExceptionが返ってきたら即座に次のモデルに切り替えたいのに、boto3（高2）が勝手にリトライしてくれちゃいます。
デフォルトはlegacyモードという「最大5回試行、指数バックオフ」といういわば「何やってくれとんじゃワレ」でして、最悪ケースで1モデルあたり約15秒。2モデルなら最悪30秒です。

デフォルト（legacy）:
  [0秒] 試行1 → 429 → 最大1秒待機 → 試行2 → 429 → 最大2秒待機
  → 試行3 → 429 → 最大4秒待機 → 試行4 → 429 → 最大8秒待機
  → [15秒] 試行5 → 429 → ようやくClientErrorがraise

リトライ無効化:
  [0秒] モデル1 → 429、即座にexcept → [0.1秒] モデル2で成功

boto3のリトライモード

モード	デフォルト試行回数	バックオフ上限	備考
legacy	5回	なし	boto3のデフォルト。最大の罠
standard	3回	20秒	リトライクォータあり
adaptive	3回	20秒	standard + レートリミッティング（実験的）

legacyの指数バックオフ計算式（botocoreソースより）: sleep = rand(0,1) × 2^(attempt-1)。バックオフキャップがないのが厄介です。

AWSは2025年2月にSDKのデフォルトをstandardに変更する計画を発表しましたが、「潜在的な懸念」で延期され、2026年3月現在も未実施です（PR #4575もクローズ済み）。

`max_attempts` と `total_max_attempts` の違い

from botocore.config import Config

# max_attempts: リトライ回数（初回含まない） → 合計3回
config_a = Config(retries={"max_attempts": 2, "mode": "standard"})

# total_max_attempts: 合計試行回数（初回含む） → 合計3回（上と同じ）
config_b = Config(retries={"total_max_attempts": 3, "mode": "standard"})

両方指定するとtotal_max_attemptsが優先されます。なおAWS configファイル経由のmax_attemptsは初回含む合計試行回数で、同じ名前なのに意味が違います。

パターン別: リトライ設定どうすればいいのか

パターン0: デフォルト（あかん）

import boto3
from botocore.exceptions import ClientError

# 何も考えずにクライアントを作る
bedrock = boto3.client("bedrock-runtime", region_name="ap-northeast-1")

def call_bedrock(messages: list[dict]) -> dict:
    return bedrock.converse(
        modelId="jp.anthropic.claude-sonnet-4-6",
        messages=messages,
        inferenceConfig={"maxTokens": 1024},
    )

最大15秒ブロックされたうえにエラーです。

パターン1: standardモード（最低限の改善）

from botocore.config import Config

config = Config(
    retries={"max_attempts": 2, "mode": "standard"}
)
bedrock = boto3.client("bedrock-runtime", region_name="ap-northeast-1", config=config)

最大待機時間が約3秒に縮まります。リトライで通る見込みがあるならこれで十分です。

パターン2: リトライ無効化 + モデルフォールバック（おすすめ）

import boto3
from botocore.config import Config
from botocore.exceptions import ClientError


no_retry_config = Config(
    retries={"total_max_attempts": 1, "mode": "standard"}
)

bedrock = boto3.client(
    "bedrock-runtime",
    region_name="ap-northeast-1",
    config=no_retry_config,
)

# フォールバック対象のモデルIDリスト（クロスリージョン推論）
MODEL_IDS = [
    "jp.anthropic.claude-sonnet-4-6",
    "jp.anthropic.claude-sonnet-4-5-20250929-v1:0",
]

# リトライ対象のエラーコード
RETRYABLE_ERRORS = {
    "ThrottlingException",
    "ModelTimeoutException",
    "ServiceUnavailableException",
    "InternalServerException",
}


def converse_with_fallback(
    messages: list[dict],
    system: list[dict] | None = None,
    max_tokens: int = 1024,
) -> dict:
    """モデルフォールバック付きでConverse APIを呼ぶ"""

    last_error = None

    for model_id in MODEL_IDS:
        try:
            params = {
                "modelId": model_id,
                "messages": messages,
                "inferenceConfig": {"maxTokens": max_tokens},
            }
            if system:
                params["system"] = system

            response = bedrock.converse(**params)
            return response

        except ClientError as e:
            error_code = e.response["Error"]["Code"]

            if error_code not in RETRYABLE_ERRORS:
                # ValidationExceptionやAccessDeniedExceptionは即座にraiseする
                raise

            print(f"[{model_id}] {error_code}: {e}")
            last_error = e
            continue

    # 全モデル失敗
    raise last_error

total_max_attempts=1でリトライを殺し、自前のforループでモデルを切り替えます。

パターン3: フォールバック + ラウンド制リトライ（全滅対策）

パターン2の拡張。全モデル失敗したら少し待って再トライします。

import time

# bedrock, MODEL_IDS, RETRYABLE_ERRORS はパターン2と同じ

def converse_with_fallback_and_retry(
    messages: list[dict],
    system: list[dict] | None = None,
    max_tokens: int = 1024,
    max_rounds: int = 3,
    base_delay: float = 5.0,
) -> dict:
    """
    モデルフォールバック + ラウンド制リトライ

    1ラウンド = 全モデルを1回ずつ試す
    全モデル失敗したらbase_delay秒待って次のラウンドへ
    """

    last_error = None

    for round_num in range(max_rounds):
        if round_num > 0:
            delay = base_delay * round_num
            print(f"全モデル失敗。{delay:.0f}秒待って再トライ（{round_num + 1}/{max_rounds}）")
            time.sleep(delay)

        for model_id in MODEL_IDS:
            try:
                params = {
                    "modelId": model_id,
                    "messages": messages,
                    "inferenceConfig": {"maxTokens": max_tokens},
                }
                if system:
                    params["system"] = system

                response = bedrock.converse(**params)
                return response

            except ClientError as e:
                error_code = e.response["Error"]["Code"]

                if error_code not in RETRYABLE_ERRORS:
                    raise

                print(f"[Round {round_num + 1}][{model_id}] {error_code}")
                last_error = e
                continue

    raise last_error

検証コード（フォールバック動作確認）

import statistics
import time

import boto3
import botocore.config
from botocore.exceptions import ClientError

RUNS = 3
_throttled_models: set[str] = set()
_call_log: list[dict] = []

no_retry_config = botocore.config.Config(
    retries={"total_max_attempts": 1, "mode": "standard"}
)
bedrock = boto3.client(
    "bedrock-runtime", region_name="ap-northeast-1", config=no_retry_config,
)

MODEL_IDS = [
    "jp.anthropic.claude-sonnet-4-6",
    "jp.anthropic.claude-sonnet-4-5-20250929-v1:0",
]
RETRYABLE_ERRORS = {
    "ThrottlingException", "ModelTimeoutException",
    "ServiceUnavailableException", "InternalServerException",
}

_original_converse = bedrock.converse

def _mock_converse(**kwargs):
    """スロットル対象ならエラー、それ以外は実際のAPI呼び出し"""
    model_id = kwargs.get("modelId", "unknown")
    entry = {"model_id": model_id, "time": time.monotonic()}
    if model_id in _throttled_models:
        entry["result"] = "ThrottlingException"
        _call_log.append(entry)
        raise ClientError(
            error_response={"Error": {"Code": "ThrottlingException", "Message": "Too many requests"}},
            operation_name="Converse",
        )
    response = _original_converse(**kwargs)
    entry["result"] = "Success"
    _call_log.append(entry)
    return response

bedrock.converse = _mock_converse

def converse_with_fallback(messages, max_tokens=100):
    last_error = None
    for model_id in MODEL_IDS:
        try:
            return bedrock.converse(
                modelId=model_id, messages=messages,
                inferenceConfig={"maxTokens": max_tokens},
            )
        except ClientError as e:
            if e.response["Error"]["Code"] not in RETRYABLE_ERRORS:
                raise
            last_error = e
            continue
    raise last_error

messages = [{"role": "user", "content": [{"text": "こんにちは"}]}]

# ケース1: モデル1スロットル → モデル2にフォールバック（実際にAPI呼び出し）
_throttled_models = {MODEL_IDS[0]}
for run in range(RUNS):
    _call_log.clear()
    start = time.monotonic()
    response = converse_with_fallback(messages)
    elapsed = time.monotonic() - start
    if run == 0:
        print(f"所要時間: {elapsed:.1f}秒")
        for entry in _call_log:
            print(f"  {entry['model_id']}: {entry['result']}")
        text = response["output"]["message"]["content"][0]["text"]
        print(f"レスポンス: {text[:100]}")

# ケース2: 両方成功（モデル1で即応答）
_throttled_models = set()
_call_log.clear()
start = time.monotonic()
response = converse_with_fallback(messages)
elapsed = time.monotonic() - start
print(f"\nモデル1成功: {elapsed:.1f}秒")
for entry in _call_log:
    print(f"  {entry['model_id']}: {entry['result']}")

どのパターンを選ぶか

パターン	最大待機時間（2モデル）	向いてるケース
0（デフォルト）	約30秒	多分ない
1（standard）	約6秒	バッチ処理、リトライで通る見込みがある
2（フォールバック）	約0.1秒	ウェブアプリ、レスポンス速度重視
3（フォールバック+リトライ）	約15秒	全滅もありえる高負荷環境

Strands Agentsを使ってる場合

リトライは勝手にやってくれる

Strands Agents SDK BedrockModelにはboto3とは別レイヤーのリトライ機構（ModelRetryStrategy）があり、デフォルトで最大6回（初回含む）、初期遅延4秒の指数バックオフでリトライします（最大128秒）。スロットリングのリトライは自前で書かなくて大丈夫です。

検証コード（Strands ModelRetryStrategyの実測）

import os
import statistics
import time
from collections import Counter
from typing import Any

os.environ.setdefault("AWS_ACCESS_KEY_ID", "testing")
os.environ.setdefault("AWS_SECRET_ACCESS_KEY", "testing")
os.environ.setdefault("AWS_DEFAULT_REGION", "ap-northeast-1")

from botocore.config import Config
from botocore.exceptions import ClientError
from strands import Agent, ModelRetryStrategy
from strands.models.bedrock import BedrockModel
from strands.types.exceptions import ModelThrottledException

RUNS = 3
no_retry_config = Config(retries={"total_max_attempts": 1, "mode": "standard"})

_call_count = 0
_call_timestamps: list[float] = []
_call_models: list[str] = []

def _throttle(**kwargs: Any):
    """常に ThrottlingException を投げる"""
    global _call_count
    _call_count += 1
    _call_timestamps.append(time.monotonic())
    _call_models.append(kwargs.get("modelId", "unknown"))
    raise ClientError(
        error_response={"Error": {"Code": "ThrottlingException", "Message": "Too many requests"}},
        operation_name="Converse",
    )

# ModelRetryStrategy（max_attempts=3, initial_delay=1）の動作確認
elapsed_times = []
for run in range(RUNS):
    model = BedrockModel(
        model_id="jp.anthropic.claude-sonnet-4-6",
        region_name="ap-northeast-1",
        boto_client_config=no_retry_config,
        streaming=False,
    )
    agent = Agent(
        model=model,
        retry_strategy=ModelRetryStrategy(max_attempts=3, initial_delay=1, max_delay=10),
        callback_handler=None,
    )
    original = model.client.converse
    model.client.converse = _throttle
    _call_count = 0
    _call_timestamps.clear()
    _call_models.clear()
    start_time = time.monotonic()
    try:
        agent("hello")
    except (ModelThrottledException, Exception):
        pass
    elapsed = time.monotonic() - start_time
    elapsed_times.append(elapsed)
    model.client.converse = original
    if run == 0:
        first_gaps = [
            _call_timestamps[i] - _call_timestamps[i - 1]
            for i in range(1, len(_call_timestamps))
        ] if len(_call_timestamps) >= 2 else []

print(f"試行回数: {_call_count}回")
print(f"所要時間: 平均 {statistics.mean(elapsed_times):.1f}秒")
if first_gaps:
    print(f"各試行間の待機時間:")
    for i, gap in enumerate(first_gaps):
        print(f"  試行{i + 1} → 試行{i + 2}: {gap:.1f}秒")

でもモデルフォールバックはない

モデルフォールバックは自前で実装するしかないです。デフォルトのリトライが動くと最大124秒待つのでウェブアプリでは致命的です。retry_strategy=NoneでStrandsのリトライを、boto_client_configでboto3のリトライも無効化します。

from botocore.config import Config
from strands import Agent
from strands.models.bedrock import BedrockModel
from strands.types.exceptions import ModelThrottledException

no_retry_config = Config(
    retries={"total_max_attempts": 1, "mode": "standard"}
)

MODEL_IDS = [
    "jp.anthropic.claude-sonnet-4-6",
    "jp.anthropic.claude-sonnet-4-5-20250929-v1:0",
]


def invoke_with_fallback(
    prompt: str,
    model_ids: list[str],
    system_prompt: str | None = None,
) -> str:
    """Strands Agent でモデルフォールバックを実現する"""
    last_error = None

    for model_id in model_ids:
        model = BedrockModel(
            model_id=model_id,
            region_name="ap-northeast-1",
            boto_client_config=no_retry_config,
        )
        agent = Agent(
            model=model,
            system_prompt=system_prompt or "",
            retry_strategy=None,  # Strandsのリトライも無効化
        )

        try:
            result = agent(prompt)
            return result.message
        except ModelThrottledException as e:
            print(f"[{model_id}] スロットル: {e}")
            last_error = e
            continue

    raise last_error

StrandsはClientErrorをModelThrottledExceptionに変換して投げるので、catchするのはそちらです。検証ではモデル1がスロットルされた場合にすぐモデル2にフォールバックできました（実際のレイテンシはネットワークやサーバーの応答時間に依存します）。デフォルト設定だとboto3のリトライとStrandsのリトライが二重にかかり、最大214秒（boto3 legacy 15秒 × Strands 6回 + Strandsバックオフ 124秒）待つ計算です。

まとめ

boto3のデフォルトリトライはBedrockのモデルフォールバックと相性が悪いです。total_max_attempts=1で無効化して自前で制御しましょう。Strands Agentsならretry_strategy=Noneも忘れずに。

絶え間なく注ぐ愛の名を永遠と呼ぶことができるよう、Bedrockのリトライ設定も適切に調整して、ユーザーに最高の体験を届けましょう。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up