More than 1 year has passed since last update.

PostgreSQLで、where inとwhere =の性能比較

Posted at 2024-02-10

SQLにおいて、大量のレコードを取得するとき、

select * from テーブル名 where comment_id = 1;
select * from テーブル名 where comment_id = 2;
select * from テーブル名 where comment_id = 3;

を繰り返す方法と、

select * from テーブル名 where comment_id in (0,1,2,3,......）;

で一括で取得する方法があります。

後者の方が高速だというのは常識的なことだとは思いますが、実際どの程度の性能差があるのか？というのが気になったので、少し遊んで実験してみました。

Windows PowerShellで、コマンドの実行時間を計測するためのコマンドとして、Measure-Commandがあるので、これを使います。

10万件取得の場合

1件取得クエリ*10万回

実行コマンド

Measure-Command { psql -U postgres -d sampledb -f select_comments.sql > result.txt }

実行結果

TotalSeconds      : 41.5504978
TotalMilliseconds : 41550.4978

100000件取得クエリ*1回

実行コマンド

Measure-Command { psql -U postgres -d sampledb -f select_comments.sql > result.txt }

実行結果

TotalSeconds      : 15.3717338
TotalMilliseconds : 15371.7338

約2.7倍差でした。

1万件と5万件の場合も実行はしてみました。
ただし、先に上記の10万件を実行しています。

同じクエリを流すと、2回目以降だとキャッシュのせいで高速化されるのであまり意味ないですね、、、PostgreSQLにおいてはクエリ結果のキャッシュ（結果キャッシュ）を直接削除する機能はないので、ここで打ち切ります。

1件取得クエリ*1万回

実行コマンド

Measure-Command { psql -U postgres -d sampledb -f select_comments10000.sql > result10000.txt }

実行結果

TotalSeconds      : 5.8080719
TotalMilliseconds : 5808.0719

10000件取得クエリ*1回

実行コマンド

Measure-Command { psql -U postgres -d sampledb -f select_comments2_10000.sql > result10000_2.txt }

実行結果

TotalSeconds      : 3.9993424
TotalMilliseconds : 3999.3424

約1.4倍差でした。

1件取得クエリ*5万回

実行コマンド

Measure-Command { psql -U postgres -d sampledb -f select_comments50000.sql > result50000.txt }

実行結果

TotalSeconds      : 16.7541638
TotalMilliseconds : 16754.1638

50000件取得クエリ*1回

実行コマンド

Measure-Command { psql -U postgres -d sampledb -f select_comments2_50000.sql > result50000_2.txt }

実行結果

TotalSeconds      : 6.1022535
TotalMilliseconds : 6102.2535

2.7倍差でした。