More than 1 year has passed since last update.

PostgreSQLAdvent Calendar 2022

PostgreSQLの分析機能アレコレ

Last updated at 2022-12-08Posted at 2022-12-08

この投稿は、PostgreSQL Advent Calendar 2022の8日目です。
昨日は@quiverさんによるPostgreSQLのUPSERT時にデッドタプルはどう増える？でした。

～おことわり～

PostgreSQLは毎日触っているものの、設計、運用のことばかりで業務のSQLを書くことはほとんどない。そんな私が久しぶりにSQL（主にWindow関数）の勉強をしました。その関連で最近のPostgreSQLの分析系機能とか、気になることをつらつらと書き連ねてみます。

「んで？だからどーなのよ」みたいな、結論のない、実用性のない話題になりそうなので投稿するか3秒だけ考えましたが、今後追及したいことのメモも兼ねていこうと思います！

Window関数、完全に理解した

キッカケは、PostgreSQL 15のリリース時、プレスキットの翻訳をしていて、トップレベルでフィーチャーされてたコレ

他にも目を惹くような、ウリになりそうな話題がある中で先頭に書かれてたので、Window関数じつは書いたことないとか言ってないで、そろそろちゃんと理解せんとな～と思ったのでした。もともと、パーティショニングやパラレルクエリといった分析系のサーバーサイドの機能は検証していたので、あとはクエリ側も勉強してみよっかな、ということで。

達人に学ぶSQL徹底指南書第2版を読むと、文法規則としてはスッキリ明快に。あとは 意味のある分析をするには集計関数のラインナップを知り、数学や統計的な意味を知るってところだよな～ までは来ました。本書の中で移動平均が例にでますが、「これ、コロナの分析で偉い人が言ってるやつだ」ぐらいの知識で、調べるにつれ世の中が一段階クリアに見えてきた気がします。

この調子でダラダラ行きます。いいですか？？

PostgreSQLの分析機能

せっかくWindow関数を学んで、分析クエリを書く側の入り口に立ったので、他の分析機能と並べてみようかなと思います。タイトルにもいれた「分析機能」ですが、Window関数のようなSQL構文の話だったり、パラレルクエリのようなサーバーサイドの設定や最適化の話があります。共通するのはテーブル全体or広範囲を対象に集計をとるようなタイプの処理です。

分類	機能
SQL構文	GROUP BY、HAVING Window関数 GROUPING SETS、ROLLUP、CUBE TABLESAMPLE 各種集計関数　など
サーバーサイドの機能	パラレルクエリテーブルパーティショニング BRINインデックス　など

GROUP BYなんてしょっちゅう使うヤツじゃん、といえばその通りですが、索引を使って条件にヒットしたものを表示用に集計するのが通常のOLTP処理の使い方だとしたら、同じGROUP BYでも索引を使わずにテーブルの全件を対象にするようなものをイメージして書いています。BETWEEN条件の幅が広いのも同じ。

で、個人的にずっと気になっていた これらの機能がお互い上手く嚙み合うのか？ と言う点をもう少し探りたいと思います。Window関数使いにとって、パラレルクエリの充実は嬉しいのか？パーティションは？

ちょっとした題材で検証

最近、巨大なテーブルのパーティション設計をする仕事がありました。
まずはよく実行されるSQLを眺めWHERE条件を抜き出し、パーティションキーに指定するといい感じに絞れそうな列にあたりをつけました。
ではその列で、レンジ、リスト、ハッシュどのタイプのパーティション分割が良いか？レンジをどのぐらいの幅で切るのが良いか？を考えるために、この列にはどんなデータが、どんな分布で格納されてるかを調べます。巨大なテーブルにアドホックなSQLを投げてOKと言われ、ドキドキしながらクエリを組み立てます。

例として、HammerDBのTPROC-H Schemaの最も大きいlineitem表を題材に、いろいろな機能を組み合わせてクエリを投げてみます。この表には商品の販売履歴データ年間数百万～1千万件弱（スケールファクター10、約10GBのlineitem表の場合）が格納されています。

年	件数
1992	7606292
1993	9110910
1994	9116961
1995	9100951
1996	9116904
1997	9121710
1998	6825492

「年単位パーティションで分割するなら、まあまあ均等にデータが入りそうですね。」みたいなことをSQLで実際のデータで確かめます。

SQLの例

lineitem表のl_shipdate列にはタイムスタンプ型（YYYY-MM-DD HH:MI:SS.ususus）が入っているので、そこからdate_partで年情報のみ取り出し、その結果でGROUP BYする。

SELECT date_part('year',l_shipdate),count(*)
FROM lineitem
GROUP BY 1;

 date_part |  count
-----------+---------
      1992 | 7606292
      1993 | 9110910
      1994 | 9116961
      1995 | 9100951
      1996 | 9116904
      1997 | 9121710
      1998 | 6825492

このときの実行計画は、

                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=1479646.64..1481921.74 rows=2507 width=16) (actual time=4152.351..4154.872 rows=7 loops=1)
   Group Key: (date_part('year'::text, l_shipdate))
   ->  Gather Merge  (cost=1479646.64..1481802.66 rows=17549 width=16) (actual time=4152.340..4154.857 rows=56 loops=1)
         Workers Planned: 7
         Workers Launched: 7
         ->  Sort  (cost=1478646.52..1478652.79 rows=2507 width=16) (actual time=4147.755..4147.756 rows=7 loops=8)
               Sort Key: (date_part('year'::text, l_shipdate))
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
　　　　　　　　　　（中略）
           ☆  ->  Partial HashAggregate  (cost=1478473.64..1478504.98 rows=2507 width=16) (actual time=4147.702..4147.723 rows=7 loops=8)
                     Group Key: date_part('year'::text, l_shipdate)
                     Batches: 1  Memory Usage: 121kB
                     Worker 0:  Batches: 1  Memory Usage: 121kB
　　　　　　　　　　　　（中略）
                ★   ->  Parallel Seq Scan on lineitem  (cost=0.00..1435620.89 rows=8570551 width=8) (actual time=0.098..2939.771 rows=7499902 loops=8)
 Planning Time: 0.128 ms
 Execution Time: 4154.936 ms

★で10GBの表を8ワーカーで分担してパラレルスキャンに3秒弱、☆で各ワーカーが得た結果にGROUP BYかけて7行まで結果セットを小さくできます。
☆より上はGather Mergeで各ワーカーが持つ7行 × 8ワーカーの結果を集め56行、Finalize GroupAggregateで本来の集計結果を得るための集計をかけていますが、☆で十分絞り込めているのでこのあたりの処理はミリ秒の桁です。

パラレルクエリ用のパラメータは10コア程度まで使えるように大きくしていますが、特に意識せずGROUP BYのSQLを書いただけでパラレルクエリが発動し、良好な結果になったと思います。

無茶なSQL①

最初、date_part関数の存在を忘れていて、無茶なSQLをいくつか書いていましたｗ
「1992年に含まれるデータ」をサブクエリとかを使って頑張って取り出そうとしていました。

その例がこちら

WITH "1992" AS ( SELECT '1992' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1992-01-02 00:00:00' and '1992-12-31 23:59:59')
    ,"1993" AS ( SELECT '1993' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1993-01-01 00:00:00' and '1993-12-31 23:59:59')
    ,"1994" AS ( SELECT '1994' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1994-01-01 00:00:00' and '1994-12-31 23:59:59')
    ,"1995" AS ( SELECT '1995' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1995-01-01 00:00:00' and '1995-12-31 23:59:59')
    ,"1996" AS ( SELECT '1996' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1996-01-01 00:00:00' and '1996-12-31 23:59:59')
    ,"1997" AS ( SELECT '1997' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1997-01-01 00:00:00' and '1997-12-31 23:59:59')
    ,"1998" AS ( SELECT '1998' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1998-01-01 00:00:00' and '1998-12-31 23:59:59')

SELECT year,count from "1992"
UNION ALL SELECT year,count from "1993"
UNION ALL SELECT year,count from "1994"
UNION ALL SELECT year,count from "1995"
UNION ALL SELECT year,count from "1996"
UNION ALL SELECT year,count from "1997"
UNION ALL SELECT year,count from "1998"
ORDER BY year
;

このSQLのいいところは、整然とならぶ文字列が気持ちいいところのみで、
このSQLの悪いところは、

1992～1998のデータしかないことが分かってないと書けない
1秒以下の精度でタイムスタンプが入ってるとBETWEENが使えない。（不等号ならOK）
長い

この時の実行計画がこちら

                                                                                  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=6745842.65..6745843.00 rows=3 width=40) (actual time=48330.190..48330.328 rows=7 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Sort  (cost=6744842.61..6744842.61 rows=1 width=40) (actual time=42049.744..42049.750 rows=2 loops=4)
         Sort Key: ('1996'::text)
         Sort Method: quicksort  Memory: 25kB
         Worker 0:  Sort Method: quicksort  Memory: 25kB
         Worker 1:  Sort Method: quicksort  Memory: 25kB
         Worker 2:  Sort Method: quicksort  Memory: 25kB
     ☆  ->  Parallel Append  (cost=2251415.81..6744842.60 rows=1 width=40) (actual time=25118.415..42049.641 rows=2 loops=4)
               ->  Aggregate  (cost=2251999.42..2251999.43 rows=1 width=40) (actual time=21549.800..21549.801 rows=1 loops=1)
                 ★   ->  Seq Scan on lineitem  (cost=0.00..2228396.84 rows=9441033 width=0) (actual time=0.234..20920.489 rows=9116904 loops=1)
                           Filter: ((l_shipdate >= '1996-01-01 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1996-12-31 23:59:59'::timestamp without time zone))
                           Rows Removed by Filter: 50882316
               ->  Aggregate  (cost=2251424.16..2251424.17 rows=1 width=40) (actual time=26309.258..26309.259 rows=1 loops=1)
                     ->  Seq Scan on lineitem lineitem_1  (cost=0.00..2228396.84 rows=9210928 width=0) (actual time=0.225..25713.021 rows=9121710 loops=1)
                           Filter: ((l_shipdate >= '1997-01-01 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1997-12-31 23:59:59'::timestamp without time zone))
                           Rows Removed by Filter: 50877510
　　　　　　　　　　　　　　　　　　　　　　　：
          　　　　　　　　　　　（以下、同じSeq ScanとAggregaetの塊が7年分続く）
 Planning Time: 0.594 ms
 Execution Time: 48330.426 ms

☆Parallel AppendというのはUNIONやパーティション表の検索結果をがっちゃんこするAppendをパラレル実行したという意味で、ここ3年ぐらいの比較的新しい機能です。
ただし、★を見ると10GBの表をスキャンするのに1ワーカーしか分担しておらず、20秒以上かかっています。先ほどのGROUP BYのみの単純なスキャンでは8ワーカーで3秒でした。

無茶なSQL②

もっといい書き方あるよな～とボヤきながら小手先の手直しをすること3分ぐらい。

SELECT year,count FROM ( SELECT '1992' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1992-01-02 00:00:00' and '1992-12-31 23:59:59') AS "1992"
UNION
SELECT year,count FROM ( SELECT '1993' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1993-01-01 00:00:00' and '1993-12-31 23:59:59') AS "1993"
UNION
SELECT year,count FROM ( SELECT '1994' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1994-01-01 00:00:00' and '1994-12-31 23:59:59') AS "1994"
UNION
SELECT year,count FROM ( SELECT '1995' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1995-01-01 00:00:00' and '1995-12-31 23:59:59') AS "1995"
UNION
SELECT year,count FROM ( SELECT '1996' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1996-01-01 00:00:00' and '1996-12-31 23:59:59') AS "1996"
UNION
SELECT year,count FROM ( SELECT '1997' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1997-01-01 00:00:00' and '1997-12-31 23:59:59') AS "1997"
UNION
SELECT year,count FROM ( SELECT '1998' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1998-01-01 00:00:00' and '1998-12-31 23:59:59') AS "1998"
;

思想は変わらず。しかし、この実行計画をみてびっくり。

                                                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10227809.26..10227809.33 rows=7 width=40) (actual time=16130.753..16133.169 rows=7 loops=1)
   Group Key: ('1992'::text), (count(*))
   Batches: 1  Memory Usage: 24kB
☆  ->  Append  (cost=1460720.90..10227809.23 rows=7 width=40) (actual time=2308.403..16133.125 rows=7 loops=1)
         ->  Finalize Aggregate  (cost=1460720.90..1460720.91 rows=1 width=40) (actual time=2308.402..2308.539 rows=1 loops=1)
               ->  Gather  (cost=1460720.17..1460720.88 rows=7 width=8) (actual time=2308.299..2308.527 rows=8 loops=1)
                     Workers Planned: 7
                     Workers Launched: 7
                     ->  Partial Aggregate  (cost=1459720.17..1459720.18 rows=1 width=8) (actual time=2303.714..2303.715 rows=1 loops=8)
                      ★   ->  Parallel Seq Scan on lineitem  (cost=0.00..1457047.26 rows=1069164 width=0) (actual time=0.054..2240.762 rows=950786 loops=8)
                                 Filter: ((l_shipdate >= '1992-01-02 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1992-12-31 23:59:59'::timestamp without time zone))
                                 Rows Removed by Filter: 6549116
         ->  Finalize Aggregate  (cost=1461336.42..1461336.43 rows=1 width=40) (actual time=2323.858..2323.961 rows=1 loops=1)
               ->  Gather  (cost=1461335.69..1461336.40 rows=7 width=8) (actual time=2323.751..2323.950 rows=8 loops=1)
                     Workers Planned: 7
                     Workers Launched: 7
                     ->  Partial Aggregate  (cost=1460335.69..1460335.70 rows=1 width=8) (actual time=2318.852..2318.853 rows=1 loops=8)
                           ->  Parallel Seq Scan on lineitem lineitem_1  (cost=0.00..1457047.26 rows=1315370 width=0) (actual time=0.051..2244.974 rows=1138864 loops=8)
                                 Filter: ((l_shipdate >= '1993-01-01 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1993-12-31 23:59:59'::timestamp without time zone))
                                 Rows Removed by Filter: 6361039
　　　　　　　　　　　　　　　　　　　　　　　：
          （以下、同じParallel Seq ScanとPartial Aggregateの塊が7年分続く）
 Planning Time: 0.519 ms
 Execution Time: 16133.327 ms
(62 行)

★で8並列で読み、ワーカーが持つ結果セットを集計し、☆はパラレルでない単なるAppend。結果が48秒→16秒と3倍近く早くなった。

義務教育で「UNIONは結果を集約するからオーバーヘッドになり、結果を単に並べるだけならUNION ALLでええんじゃ」と習った方は多いかと思いますが、パラレルクエリが絡むとそうでもないのかもしれません。この辺はもっと追及してみたいところ。ちなみに、上記のSQL①②はWITH句と副問い合わせですが内部では等価とみなされているようで、互いのUNIONとUNION ALLを入れ替えると、やっぱりUNIONのほうが後者の実行計画になり同等の処理時間になりました。

無茶なSQL③

実際はこの辺で最初のGROUP BYに気づいたのですが、せっかくWindow関数やったのでそれっぽく書いてみようと思って、~~ほぼ意味のない計算をやらせる~~SQLを書きました。

SELECT year,max(count) FROM
    (
     SELECT date_part('year',l_shipdate) AS year,
            count(*) OVER (PARTITION BY date_part('year',l_shipdate)) AS count
     FROM lineitem
    ) AS li
GROUP BY year;

期待は、スキャン一回で済むからGROUP BYとさほど変わらないのでは？

                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2423560.94..11744037.00 rows=200 width=16) (actual time=13321.240..44203.630 rows=7 loops=1)
   Group Key: (date_part('year'::text, lineitem.l_shipdate))
☆ ->  WindowAgg  (cost=2423560.94..10844127.16 rows=59993856 width=16) (actual time=7647.706..39173.654 rows=59999220 loops=1)
         ->  Gather Merge  (cost=2423560.94..9794234.68 rows=59993856 width=8) (actual time=4386.070..17289.370 rows=59999220 loops=1)
               Workers Planned: 7
               Workers Launched: 7
               ->  Sort  (cost=2422560.82..2443987.20 rows=8570551 width=8) (actual time=4362.319..5116.062 rows=7499902 loops=8)
                     Sort Key: (date_part('year'::text, lineitem.l_shipdate))
                     Sort Method: quicksort  Memory: 551372kB
                     Worker 0:  Sort Method: quicksort  Memory: 543433kB
                     Worker 1:  Sort Method: quicksort  Memory: 549387kB
　　　　　　　　　　　　　　　　（中略）
                 ★  ->  Parallel Seq Scan on lineitem  (cost=0.00..1435620.89 rows=8570551 width=8) (actual time=0.087..2978.888 rows=7499902 loops=8)
 Planning Time: 0.159 ms
 Execution Time: 44270.792 ms

★スキャンは良い。パラレルクエリがちゃんと期待通り働いてる。
しかし、window関数みたいな全行対象にするものは、結果セットの集約がパラレルでできないので☆でGatherで集めた6000万行を対象に集計をかけていた。これはキツイ！

今回はウィンドウが大きすぎ、普通のGROUP BYでできることを無駄に全行分計算したことになるけど、ウィンドウフレームをずらしながら結果を得ていく本来のWindow関数の動きを考えると納得ではあります。
この他にもパラメータ変えたりいくつか検証した結果、パラレルクエリ無しだと★のスキャン部分で数十秒、さらにWindow関数を適用する部分で数十秒かかり、最終的な所要時間は倍ぐらいかかっていたので、パラレルの恩恵はあることがわかりました。

テーブル・パーティショニングを絡める

ここまでの諸々にパーティショニングしたあとの結果も付け加えておきます。

ダサいSQL②（UNION）が活きる道

年単位パーティション化したあと、set enable_partitionwise_aggregate to on;して実行

WITH "1992" AS ( SELECT '1992' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1992-01-02 00:00:00' and '1992-12-31 23:59:59')
    ,"1993" AS ( SELECT '1993' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1993-01-01 00:00:00' and '1993-12-31 23:59:59')
    ,"1994" AS ( SELECT '1994' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1994-01-01 00:00:00' and '1994-12-31 23:59:59')
    ,"1995" AS ( SELECT '1995' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1995-01-01 00:00:00' and '1995-12-31 23:59:59')
    ,"1996" AS ( SELECT '1996' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1996-01-01 00:00:00' and '1996-12-31 23:59:59')
    ,"1997" AS ( SELECT '1997' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1997-01-01 00:00:00' and '1997-12-31 23:59:59')
    ,"1998" AS ( SELECT '1998' AS year,count(*) AS count FROM lineitem WHERE l_shipdate between '1998-01-01 00:00:00' and '1998-12-31 23:59:59')

SELECT year,count from "1992"
UNION SELECT year,count from "1993"
UNION SELECT year,count from "1994"
UNION SELECT year,count from "1995"
UNION SELECT year,count from "1996"
UNION SELECT year,count from "1997"
UNION SELECT year,count from "1998"
ORDER BY year
;

実行計画＆所要時間は↓

                                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1551507.59..1551507.66 rows=7 width=40) (actual time=3801.693..3803.776 rows=7 loops=1)
   Group Key: ('1992'::text), (count(*))
   Batches: 1  Memory Usage: 24kB
   ->  Append  (cost=196045.59..1551507.56 rows=7 width=40) (actual time=460.998..3803.733 rows=7 loops=1)
         ->  Finalize Aggregate  (cost=196045.59..196045.60 rows=1 width=40) (actual time=460.997..461.136 rows=1 loops=1)
               ->  Gather  (cost=98420.61..196045.57 rows=10 width=8) (actual time=460.895..461.128 rows=9 loops=1)
                     Workers Planned: 5
                     Workers Launched: 5
                     ->  Parallel Append  (cost=97420.61..195044.57 rows=2 width=8) (actual time=451.896..456.650 rows=2 loops=6)
                           ->  Partial Aggregate  (cost=97623.92..97623.93 rows=1 width=8) (actual time=446.380..446.381 rows=1 loops=3)
                                 ->  Parallel Seq Scan on lineitem_1992_2 lineitem_1  (cost=0.00..95720.57 rows=761341 width=0) (actual time=0.051..380.260 rows=1269063 loops=3)
                                       Filter: ((l_shipdate >= '1992-01-02 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1992-12-31 23:59:59'::timestamp without time zone))
                           ->  Partial Aggregate  (cost=97420.61..97420.62 rows=1 width=8) (actual time=233.455..233.456 rows=1 loops=6)
                                 ->  Parallel Seq Scan on lineitem_1992_1 lineitem  (cost=0.00..95521.31 rows=759723 width=0) (actual time=0.063..199.148 rows=633184 loops=6)
                                       Filter: ((l_shipdate >= '1992-01-02 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1992-12-31 23:59:59'::timestamp without time zone))
         ->  Finalize Aggregate  (cost=234621.36..234621.37 rows=1 width=40) (actual time=564.965..565.065 rows=1 loops=1)
               ->  Gather  (cost=117750.26..234621.34 rows=10 width=8) (actual time=564.411..565.056 rows=9 loops=1)
                     Workers Planned: 5
                     Workers Launched: 5
                     ->  Parallel Append  (cost=116750.26..233620.34 rows=2 width=8) (actual time=559.713..560.105 rows=2 loops=6)
                           ->  Partial Aggregate  (cost=116870.05..116870.06 rows=1 width=8) (actual time=558.645..558.646 rows=1 loops=3)
                                 ->  Parallel Seq Scan on lineitem_1993_2 lineitem_3  (cost=0.00..114591.56 rows=911394 width=0) (actual time=0.050..471.799 rows=1519285 loops=3)
                                       Filter: ((l_shipdate >= '1993-01-01 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1993-12-31 23:59:59'::timestamp without time zone))
                           ->  Partial Aggregate  (cost=116750.26..116750.27 rows=1 width=8) (actual time=280.777..280.778 rows=1 loops=6)
                                 ->  Parallel Seq Scan on lineitem_1993_1 lineitem_2  (cost=0.00..114474.17 rows=910436 width=0) (actual time=0.034..237.637 rows=758843 loops=6)
                                       Filter: ((l_shipdate >= '1993-01-01 00:00:00'::timestamp without time zone) AND (l_shipdate <= '1993-12-31 23:59:59'::timestamp without time zone))

　　　　　（以下、略）
 Planning Time: 1.643 ms
 Execution Time: 3814.569 ms

年単位パーティションとサブクエリの範囲が一致してるので、各表に対するスキャン時間が短縮されました。

総所要時間は16秒 → 3.8秒へ。

一方、ノーマルのGROUP BYは

SELECT date_part('year',l_shipdate),count(*) FROM lineitem
GROUP BY 1;

                                                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------
 Finalize GroupAggregate  (cost=1545894.53..1549019.19 rows=2506 width=16) (actual time=5340.199..5342.802 rows=7 loops=1)
   Group Key: (date_part('year'::text, lineitem_10.l_shipdate))
   ->  Gather Merge  (cost=1545894.53..1548865.04 rows=24565 width=16) (actual time=5340.187..5342.789 rows=22 loops=1)
         Workers Planned: 5
         Workers Launched: 5
         ->  Sort  (cost=1544894.45..1544906.73 rows=4913 width=16) (actual time=5334.671..5334.679 rows=4 loops=6)
               Sort Key: (date_part('year'::text, lineitem_10.l_shipdate))
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Worker 1:  Sort Method: quicksort  Memory: 25kB
               Worker 2:  Sort Method: quicksort  Memory: 25kB
               Worker 3:  Sort Method: quicksort  Memory: 25kB
               Worker 4:  Sort Method: quicksort  Memory: 25kB
               ->  Parallel Append  (cost=116887.63..1544593.22 rows=4913 width=16) (actual time=2321.327..5334.633 rows=4 loops=6)
                     ->  Partial HashAggregate  (cost=117018.51..117023.08 rows=365 width=16) (actual time=2417.284..2417.286 rows=1 loops=1)
                           Group Key: date_part('year'::text, lineitem_10.l_shipdate)
                           Worker 4:  Batches: 1  Memory Usage: 37kB
                           ->  Parallel Seq Scan on lineitem_1997_1 lineitem_10  (cost=0.00..112455.22 rows=912658 width=8) (actual time=0.049..1687.392 rows=4563289 loops=1)
                     ->  Partial HashAggregate  (cost=116907.82..116912.38 rows=365　width=16) (actual time=2432.840..2432.842 rows=1 loops=1)
                           Group Key: date_part('year'::text, lineitem_5.l_shipdate)
                           Worker 3:  Batches: 1  Memory Usage: 37kB
                           ->  Parallel Seq Scan on lineitem_1994_2 lineitem_5  (cost=0.00..112348.73 rows=911818 width=8) (actual time=0.058..1706.500 rows=4559091 loops=1)

　　　　　　　　　　　　　（中略）

 Planning Time: 0.953 ms
 Execution Time: 5343.265 ms

パーティション分割されたことでパラレルクエリの並列度が8まで上がらず、分割された表を全部舐めるのでオーバーヘッドになっているようです。約4秒→5秒

表のサンプリング TABLESAMPLE句

長くなってきたのでアッサリ触れますが、大量データの集計をする際に、表の何パーセントを触るか指定し概算のみを計算するTABLESAMPLE句があります。本来、アクセスするブロック数を大幅に減らし、しかし十分大きなサンプルであれば集計結果への影響は軽微になるという期待ができます。今回例で使ったCOUNTであれば、単純に取得した割合における行数になっているので、10パーセントで取得したら本当の行数は10倍、平均値なんかはそのまま使える数字が得られるかもしれません。が、パラレルクエリが絡むとそうでもないようです。

SELECT date_part('year',l_shipdate),count(*) FROM lineitem TABLESAMPLE SYSTEM(50)
GROUP BY 1;

                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=3482483.56..3482560.06 rows=200 width=16) (actual time=17152.488..17154.426 rows=7 loops=1)
   Group Key: (date_part('year'::text, lineitem.l_shipdate))
   ->  Gather Merge  (cost=3482483.56..3482554.56 rows=600 width=16) (actual time=17152.475..17154.412 rows=10 loops=1)
         Workers Planned: 3
         Workers Launched: 3
         ->  Sort  (cost=3481483.52..3481484.02 rows=200 width=16) (actual time=14360.652..14360.657 rows=2 loops=4)
               Sort Key: (date_part('year'::text, lineitem.l_shipdate))
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Worker 1:  Sort Method: quicksort  Memory: 25kB
               Worker 2:  Sort Method: quicksort  Memory: 25kB
               ->  Partial HashAggregate  (cost=3481473.37..3481475.87 rows=200 width=16) (actual time=14360.621..14360.627 rows=2 loops=4)
                     Group Key: (date_part('year'::text, lineitem.l_shipdate))
                     Batches: 1  Memory Usage: 40kB
                     Worker 0:  Batches: 1  Memory Usage: 40kB
                     Worker 1:  Batches: 1  Memory Usage: 40kB
                     Worker 2:  Batches: 1  Memory Usage: 40kB
                     ->  Parallel Append  (cost=0.00..3336313.64 rows=29031946 width=8) (actual time=0.093..10723.944 rows=22505707 loops=4)
                           ->  Sample Scan on lineitem_1996_2 lineitem_10  (cost=0.00..696327.44 rows=6839955 width=8) (actual time=0.112..2840.457 rows=6823816 loops=1)
                                 Sampling: system ('50'::real)
                           ->  Sample Scan on lineitem_1993_1 lineitem_3  (cost=0.00..696236.45 rows=6838756 width=8) (actual time=0.110..2843.729 rows=6849432 loops=1)
                                 Sampling: system ('50'::real)
                           ->  Sample Scan on lineitem_1997_2 lineitem_12  (cost=0.00..696226.70 rows=6837976 width=8) (actual time=0.110..2833.507 rows=6838093 loops=1)
                                 Sampling: system ('50'::real)
                           ->  Sample Scan on lineitem_1993_2 lineitem_4  (cost=0.00..696020.82 rows=6836546 width=8) (actual time=0.122..2834.469 rows=6845848 loops=1)
                                 Sampling: system ('50'::real)
                           ->  Sample Scan on lineitem_1996_1 lineitem_9  (cost=0.00..695851.40 rows=6834512 width=8) (actual time=0.120..2802.484 rows=6837681 loops=1)
                                 Sampling: system ('50'::real)
                           ->  Sample Scan on lineitem_1994_2 lineitem_6  (cost=0.00..695848.88 rows=6833670 width=8) (actual time=0.092..2807.648 rows=6823304 loops=1)

Sample Scan という計画タイプになり、表のスキャン部分にパラレルが効きませんでした。一つ上のGROUP BYのSQLでは1パーティションに対するスキャンは1秒程度のところが、こちらは2.8秒。総所要時間も17秒まで膨らんでしまいました。

これはサンプリングするブロック数を鑑みた結果、単純にパラレルの対象外になったのか、それともパラレルできない何か制限があったのか、今後深追いしてみたいと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up