0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

PostgreSQL テーブル差集合のパフォーマンス

Last updated at Posted at 2024-02-10

PostgreSQL テーブル差集合のパフォーマンス

お題

次の記事を参照後、興味があってPostgreSQL におけるテーブル差集合のパフォーマンスを測定しました。

今回は灰色部分を求めます。

差集合

結論

測定結果は次のようになりました

not exists >>> [Big Wall] >>> left join / is NULL >= not in > except all

[not in]

 Planning Time: 0.047 ms
 Execution Time: 475.355 ms

[not exists]

 Planning Time: 0.024 ms
 Execution Time: 3.766 ms

[left join / is NULL]

 Planning Time: 0.033 ms
 Execution Time: 349.660 ms

[except all]

 Planning Time: 0.017 ms
 Execution Time: 718.943 ms

差集合を得たい場合は exists を使えば、パフォーマンス問題に悩まされないようです。

参考

データに応じて測定結果が変わりますが、データ量を増減させた時のパフォーマンス特性も掲載しておきます。
t2 の行数を変化させながら測定しています。

テーブル差集合パフォーマンス

PostgreSQLの9.2以降はnot in も not existsと遜色ないという記事もあるが、そんなことはありませんでした。

実験

[環境]

  • Ubuntu 22.04
  • PostgreSQL 14

psql Cli から DB 作成後にアクセス

root@masami-L ~# createdb -U postgres sample
root@masami-L /# psql -U postgres -d sample

ベンチマーク用関数用意

sample=# CREATE FUNCTION f_test(ct int, sql text)
  RETURNS void
  LANGUAGE plpgsql AS
$func$
DECLARE
   i int;
BEGIN
   FOR i IN 1 .. $1 LOOP
      EXECUTE sql;
   END LOOP;
END
$func$;
CREATE FUNCTION

データ準備

sample=# create table t1 (
id serial,
val int default random()*1000);
CREATE TABLE
sample=# create table t2 (
id serial,
val int default random()*1000);
CREATE TABLE
sample=# insert into t1
select  from generate_series(1,10000);
INSERT 0 10000
sample=# insert into t2
select  from generate_series(1,10000);
INSERT 0 10000

■not in

実行計画の確認

sample=# explain analyze select * from t1 where (val) not in (select val from t2);
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=172.12..344.25 rows=5085 width=8) (actual time=16.781..16.783 rows=0 loops=1)
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 10000
   SubPlan 1
     ->  Seq Scan on t2  (cost=0.00..146.70 rows=10170 width=4) (actual time=0.037..4.310 rows=10000 loops=1)
 Planning Time: 1.118 ms
 Execution Time: 17.252 ms
(7 rows)

パフォーマンスの確認

sample=# EXPLAIN ANALYZE
SELECT f_test(100, 'select * from t1 where (val) not in (select val from t2);');
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=4) (actual time=475.229..475.230 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 475.355 ms
(3 rows)

■not exists

実行計画の確認

sample=# explain analyze select * from t1 where not exists (select val from t2);
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Result  (cost=0.01..145.01 rows=10000 width=8) (actual time=0.028..0.030 rows=0 loops=1)
   One-Time Filter: (NOT $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on t2  (cost=0.00..145.00 rows=10000 width=0) (actual time=0.023..0.023 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.01..145.01 rows=10000 width=8) (never executed)
 Planning Time: 0.159 ms
 Execution Time: 0.081 ms
(7 rows)

パフォーマンスの確認

sample=# EXPLAIN ANALYZE
SELECT f_test(100, 'select * from t1 where not exists (select val from t2);');
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=4) (actual time=3.752..3.752 rows=1 loops=1)
 Planning Time: 0.024 ms
 Execution Time: 3.766 ms
(3 rows)

■left join / is NUL

実行計画の確認

sample=# explain analyze select * from t1 left join t2 using (val) where t2.val is NULL;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=270.00..490.00 rows=1 width=12) (actual time=6.997..6.999 rows=0 loops=1)
   Hash Cond: (t1.val = t2.val)
   ->  Seq Scan on t1  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.013..1.048 rows=10000 loops=1)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=8) (actual time=3.554..3.555 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 519kB
         ->  Seq Scan on t2  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.008..1.291 rows=10000 loops=1)
 Planning Time: 1.331 ms
 Execution Time: 7.039 ms
(8 rows)

パフォーマンスの確認

sample=# EXPLAIN ANALYZE
SELECT f_test(100, 'select * from t1 left join t2 using (val) where t2.val is NULL;');
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=4) (actual time=349.641..349.642 rows=1 loops=1)
 Planning Time: 0.033 ms
 Execution Time: 349.660 ms
(3 rows)

■except all

実行計画確認

sample=# explain analyze select * from t1
except all select * from t2;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 HashSetOp Except All  (cost=0.00..690.00 rows=10000 width=12) (actual time=8.851..9.891 rows=9986 loops=1)
   ->  Append  (cost=0.00..590.00 rows=20000 width=12) (actual time=0.008..5.381 rows=20000 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..245.00 rows=10000 width=12) (actual time=0.008..2.070 rows=10000 loops=1)
               ->  Seq Scan on t1  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.005..0.759 rows=10000 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..245.00 rows=10000 width=12) (actual time=0.007..2.002 rows=10000 loops=1)
               ->  Seq Scan on t2  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.006..0.747 rows=10000 loops=1)
 Planning Time: 0.055 ms
 Execution Time: 10.738 ms
(8 rows)

パフォーマンス測定

sample=# EXPLAIN ANALYZE
SELECT f_test(100, 'select * from t1
except all select * from t2;');
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=4) (actual time=718.931..718.932 rows=1 loops=1)
 Planning Time: 0.017 ms
 Execution Time: 718.943 ms
(3 rows)
0
2
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?