FastAPIでasync/awiteを適当に使うのは良くない(はず、だから検証する)

Posted at 2025-06-04

FastAPIを勉強してたの時、非同期処理をよく分からず「とりあえずやっ解けばいいのかな」という感じでそこら中にasync/await をつけていました。
awaitがないとエラーが出たりして、エラーを黙らせるためにとりあえず適当につけていた記憶があります。

しかし、実際はI/O待機や外部APIを使用しない場合はむしろasyncの処理のオーバーヘッドが発生して処理時間が遅くなることがありました。

きっかけは20~50万件とかのデータを取り込む処理に時間がかかっているという話があり、どこに処理時間がかかっているのか調査した時でした。

その問題自体はバルクインサートで一気に速くなったのですが、その過程で「そもそも何でここでasync使ってるんだっけ？」と思って、なしにして処理時間を測ったら、数秒変わるかも知れないみたいになったのがきっかけでした。

その時のことを思い出しながら、FastAPIで不要なasyncが処理を遅くすることを確認してみます。

コードの作成

今回は検証なので、1000万回ループさせて、その合計をさせるという極めて無意味なただのCPUバウンドな処理を作成しました。

main.py

from fastapi import FastAPI

app = FastAPI()


@app.get("/async-cpu-bound")
async def async_cpu_bound():
    # CPUバウンド処理を無駄にasync関数にした
    total = 0
    for i in range(10_000_000):
        total += i
    return {"total": total}


@app.get("/sync-cpu-bound")
def sync_cpu_bound():
	# asyncなしのシンプルバージョン
    total = 0
    for i in range(10_000_000):
        total += i
    return {"total": total}

この関数に対して、並列でリクエストを送って処理時間を比較してみます。
今回は計測ツールとして、Go製のheyを使ってみます。
まずは hey をインストールします(Mac環境)。

brew install hey

下記コマンドで計測することができます。

hey -n 100 -c 10 http://localhost:8000/async-cpu-bound
hey -n 100 -c 10 http://localhost:8000/sync-cpu-bound

結果は下記の通りになりました。

$ hey -n 100 -c 10 http://localhost:8000/async-cpu-bound

Summary:
  Total:        19.4955 secs
  Slowest:      5.9489 secs
  Fastest:      0.2219 secs
  Average:      1.7665 secs
  Requests/sec: 5.1294
  
  Total data:   2400 bytes
  Size/request: 24 bytes

Response time histogram:
  0.222 [1]     |■
  0.795 [7]     |■■■■■
  1.367 [17]    |■■■■■■■■■■■■■
  1.940 [51]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  2.513 [20]    |■■■■■■■■■■■■■■■■
  3.085 [0]     |
  3.658 [1]     |■
  4.231 [1]     |■
  4.803 [0]     |
  5.376 [1]     |■
  5.949 [1]     |■


Latency distribution:
  10% in 0.9621 secs
  25% in 1.5405 secs
  50% in 1.9271 secs
  75% in 1.9353 secs
  90% in 1.9728 secs
  95% in 2.3690 secs
  99% in 5.9489 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0003 secs, 0.2219 secs, 5.9489 secs
  DNS-lookup:   0.0002 secs, 0.0000 secs, 0.0024 secs
  req write:    0.0001 secs, 0.0000 secs, 0.0006 secs
  resp wait:    1.7660 secs, 0.2174 secs, 5.9446 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 100 responses


$ hey -n 100 -c 10 http://localhost:8000/sync-cpu-bound

Summary:
  Total:        20.0442 secs
  Slowest:      2.6002 secs
  Fastest:      0.6797 secs
  Average:      1.9639 secs
  Requests/sec: 4.9890
  
  Total data:   2400 bytes
  Size/request: 24 bytes

Response time histogram:
  0.680 [1]     |■
  0.872 [0]     |
  1.064 [1]     |■
  1.256 [0]     |
  1.448 [1]     |■
  1.640 [8]     |■■■■■■■■■■
  1.832 [15]    |■■■■■■■■■■■■■■■■■■
  2.024 [28]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  2.216 [33]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  2.408 [8]     |■■■■■■■■■■
  2.600 [5]     |■■■■■■


Latency distribution:
  10% in 1.6394 secs
  25% in 1.8119 secs
  50% in 1.9922 secs
  75% in 2.1274 secs
  90% in 2.3112 secs
  95% in 2.4108 secs
  99% in 2.6002 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0004 secs, 0.6797 secs, 2.6002 secs
  DNS-lookup:   0.0002 secs, 0.0000 secs, 0.0016 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0003 secs
  resp wait:    1.9634 secs, 0.6796 secs, 2.6001 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 100 responses

この結果を読むのは大変なので、生成AIにまとめてもらいました。

指標	`/async-cpu-bound` (`async def`)	`/sync-cpu-bound` (`def`)	備考
総実行時間	19.50 sec	20.04 sec	同程度（ただし分布が違う）
平均応答時間	1.77 sec	1.96 sec	async の方が「一見」速く見える
最大応答時間	5.95 sec	2.60 sec	非同期の方がばらつきが大きく、ピークが深刻
RPS	5.13	4.99	同程度
95%tile 応答	2.37 sec	2.41 sec	almost equal
99%tile 応答	5.95 sec	2.60 sec	非同期側のtail latencyが劣悪

「99%tile 応答」とは「全リクエストのうち、99% がこの時間以内に処理されたという統計指標」です。サービスの安定的な処理時間と解釈してもいいかも知れません。

平均の処理時間を見るとasyncを使っている方が早いですが、時々めちゃくちゃ時間のかかった処理が発生します。安定的に早いわけではないということだと思います。

なんとも微妙な結果ですよね。

その後2回ずつ追加でやりましたが、同じような結果でした。

Run	Endpoint	平均応答時間 Avg	最速 Fastest	最遅 Slowest	P99 応答時間	RPS (Req/sec)	備考
1	`/async-cpu-bound`	1.7665 sec	0.2219 sec	5.9489 sec	5.9489 sec	5.1294	非同期
2	`/async-cpu-bound`	1.7447 sec	0.2094 sec	5.8057 sec	5.8057 sec	5.1879	非同期
3	`/sync-cpu-bound`	1.9540 sec	0.6493 sec	2.5768 sec	2.5768 sec	4.9828	同期
4	`/sync-cpu-bound`	1.9641 sec	1.0595 sec	2.5149 sec	2.5149 sec	5.0035	同期

1件ずつの処理や、平均時間を見ると非同期の方が早いです。
しかし、遅い場合はかなり遅く処理時間が安定していないということだと思います。
サービスの安定運用を考えると、無用な非同期処理はしない方が良いですが、最速という意味では非同期の方がCPUバウンド処理でも速くなってしまっています。

私の知識ではこの解釈がうまくできません。
原因が分かる方がいましたらぜひ教えてください。

逐次処理だとどうなるか

そういえば、私の経験したシーンでは並列でリクエストが来る感じではありませんでした。
逐次処理でも連続で使用する場合には影響が出るのでしょうか？

逐次リクエストのテストは、hey ではできないのでPythonで組んでみましょう。

import time

import requests


def benchmark_sequential(endpoint: str, n: int = 100):
    print(f"\nBenchmarking: {endpoint} with {n} sequential requests")
    start = time.time()
    for i in range(n):
        r = requests.get(endpoint)
        r.raise_for_status()
    total = time.time() - start
    print(f"Total time: {total:.2f} sec")
    print(f"Average per request: {total / n:.4f} sec")


if __name__ == "__main__":
    benchmark_sequential("http://localhost:8000/async-cpu-bound", 100)
    benchmark_sequential("http://localhost:8000/sync-cpu-bound", 100)

これを実行した結果が、下記です。

$ uv run benchmark_sequential.py

Benchmarking: http://localhost:8000/async-cpu-bound with 100 sequential requests
Total time: 20.11 sec
Average per request: 0.2011 sec

Benchmarking: http://localhost:8000/sync-cpu-bound with 100 sequential requests
Total time: 19.83 sec
Average per request: 0.1983 sec

若干の差はありますが、誤差程度でしょうか。
私の経験した環境ではもう少し差があったのですが、他の要因もあったのでしょうか？

純粋な関数で再テスト

そもそも先ほどのテストでは、API経由だったので純粋な関数での比較もした方が良いかなと思いました。コードの書き換えは生成AIにお任せです。

import asyncio
import time


# 共通の重い処理（CPUバウンド）
def cpu_bound():
    total = 0
    for i in range(10_000_000):
        total += i
    return total


# 非同期関数として定義（awaitなし）
async def async_cpu_bound():
    total = 0
    for i in range(10_000_000):
        total += i
    return total


# async を同期的に動かすためのラッパー
def run_async(func):
    return asyncio.run(func())


# ベンチマーク関数
def benchmark(func, n=10, label=""):
    start = time.perf_counter()
    for _ in range(n):
        func()
    duration = time.perf_counter() - start
    print(f"{label:<20}: {duration:.4f} sec (avg: {duration / n:.4f} sec)")


if __name__ == "__main__":
    print("CPU-bound function benchmark (pure Python)")
    benchmark(cpu_bound, n=10, label="sync def")

    benchmark(lambda: run_async(async_cpu_bound), n=10, label="async def")

5回連続で実行した結果です。

$ uv run benchmark_function.py
CPU-bound function benchmark (pure Python)
sync def            : 1.6513 sec (avg: 0.1651 sec)
async def           : 1.8348 sec (avg: 0.1835 sec)
$ uv run benchmark_function.py
CPU-bound function benchmark (pure Python)
sync def            : 1.6707 sec (avg: 0.1671 sec)
async def           : 1.7736 sec (avg: 0.1774 sec)
$ uv run benchmark_function.py
CPU-bound function benchmark (pure Python)
sync def            : 1.7524 sec (avg: 0.1752 sec)
async def           : 1.7823 sec (avg: 0.1782 sec)
$ uv run benchmark_function.py
CPU-bound function benchmark (pure Python)
sync def            : 1.6898 sec (avg: 0.1690 sec)
async def           : 1.8104 sec (avg: 0.1810 sec)
$ uv run benchmark_function.py
CPU-bound function benchmark (pure Python)
sync def            : 1.7413 sec (avg: 0.1741 sec)
async def           : 1.8276 sec (avg: 0.1828 sec)

この検証から少なくとも、純粋関数に不要なasyncをつけると遅くなるがその差は極めて軽微である（誤差レベル）。

処理件数が極端に多かったりすると差が出るのかも知れませんが、そこまで気にしなくても良い気がしてきました。
「私は明確に使うことに価値がある場合以外は不用意にasyncを使うな！」と思っていましたが、そこまで気にしなくてもという結果です。

しかしネットで調べてみると次のような慣用句もありました。

"Don't async what you can't await."
→ 待機しない処理に async をつけるな（慣用句的に使われます）

"Avoid async functions for CPU-bound operations."
→ CPUバウンドには async を使うべきではない

結論

そこまで気にしなくて良いけど、わざわざ無意味なasyncは使わない方が良い

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up