AWS関連で、今年学んだこと Advent Calendar 2025

Exponential Backoff and Jitterって知ってる？AWS SDKの裏側でも使われるリトライロジックを解説

Last updated at 2025-12-24Posted at 2025-12-24

TL;DR

Exponential Backoff and Jitter=「指数バックオフ + ジッター」
- クラウドで鉄板のリトライ処理ロジック
- リトライの度に指数関数的に遅延時間を増やしつつ、遅延時間にランダム性を導入する
単純な固定遅延や純粋な指数関数的遅延では、同時復帰が発生して二次障害（thundering herd）を起こす

Why Exponential Backoff and Jitter

バックオフ（遅延を増やす）：短時間で何度もリトライすると負荷が増える。指数的に待つことで負荷の集中を避ける。
ジッター（ランダム化）：全クライアントが同じタイミングで再試行すると瞬間的な負荷スパイクが発生する（thundering herd）。ランダム性を加えることで分散される。
APIレート制限、スロットリング、短時間のサービス不安定、ネットワーク切断時の自動再試行に有効なリトライ戦略である。

基本概念とアルゴリズム

sleep = リトライ間隔
base = リトライ間隔の単位となる時間
cap = リトライ間隔の最大値
attempt = リトライ回数

0) Exponential Backoffでない通常のリトライ

sleep = base

どのような場合でも、baseの時間だけ待つ。
問題点：短期間に何度もリトライし、高負荷となる可能性がある

1) Exponential Backoff

sleep = base * 2^{attempt}

待ち時間を、指数関数的に増加するようにしている。
問題点：全クライアントが同じ base/attempt で動くとスパイクが発生してしまう。

2) Exponential Backoff and Full Jitter

sleep = random(0, min(cap, base * 2^{attempt}))

利点：ランダム化の度合いが大きく、ピークの分散に強い。実装が簡単。

3) Exponential Backoff and Equal Jitter

\displaylines{
temp = min(cap, base * 2^{attempt}) \\
sleep = temp/2 + random(0,temp/2)
}

Fullより安定しつつランダム性を保つが、待ち時間は最低でもtemp/2となってしまう。

Exponential Backoff and Decorrelated Jitter

sleep = min(cap, random(base, prev_sleep * 3))

prev_sleep の初期は base。直前のスリープ時間を基に独立性を保ちつつ発散を抑える。急上昇を緩やかにしつつランダム性を確保する点が特徴。

AWS SDKでの実装

公式ドキュメントには、以下のようなジッター付きExponentialBackoffがAWS SDKに実装されていると記載されている。

\displaylines{
cap = 20秒 \\
sleep = min(cap,random(0,1)*2^{attempt})
}

実装例（Node.js / TypeScript）

シンプルな Full Jitter の実装例。Promise ベースの任意の async 関数をラップする。

// backoff.ts
export type RetryOptions = {
  baseMs?: number;   // base delay (ms)
  capMs?: number;    // max delay (ms)
  maxAttempts?: number;
  retryIf?: (err: any) => boolean; // リトライ判定
};

function randInt(max: number) {
  return Math.floor(Math.random() * max);
}

export async function retryWithFullJitter<T>(
  fn: () => Promise<T>,
  opts: RetryOptions = {}
): Promise<T> {
  const base = opts.baseMs ?? 100;
  const cap = opts.capMs ?? 10000;
  const maxAttempts = opts.maxAttempts ?? 6;
  const retryIf = opts.retryIf ?? (() => true);

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err) {
      const shouldRetry = retryIf(err);
      if (!shouldRetry || attempt === maxAttempts - 1) throw err;

      const exp = Math.min(cap, base * Math.pow(2, attempt));
      const waitMs = randInt(exp + 1); // 0..exp
      await new Promise((res) => setTimeout(res, waitMs));
    }
  }
  // unreachable
  throw new Error("retryWithFullJitter: unexpected exit");
}

使い方（例）:

import { retryWithFullJitter } from "./backoff";

await retryWithFullJitter(
  () => fetchSomeAwsApi(),
  { baseMs: 100, capMs: 8000, maxAttempts: 5, retryIf: (err) => isTransient(err) }
);

Exponential Backoff and Jitterの注意点

非冪等操作を無条件にリトライする：二重課金・重複作成のリスクあり
短い cap / 長すぎる maxAttempts：不必要に長く待たされる or 逆に短期で負荷集中。サービス特性に合わせて調整が必要。
全クライアントで同じ乱数のシードを使う：ランダム化が無効になる。真の乱数（Math.random / random）を使う。
監視がない：リトライが増えても気づかない。メトリクス導入を検討すべし。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up