More than 5 years have passed since last update.

PostgreSQLAdvent Calendar 2018

Aurora PostgreSQL 互換版（9.6.9）でフェイルオーバー時にバッファ／キャッシュの生きのこり方を雑に確かめてみた

Last updated at 2018-12-09Posted at 2018-12-06

これは PostgreSQL Advent Calendar 2018 7 日目の記事です。

昨日は kaigai さんでした。

PL/CUDAを使ってロジスティック回帰分析を実装してみた（KaiGaiの俺メモ）

わたしは PostgreSQL 7.4 あたりで PostgreSQL から離れて以来、ずっと MySQL を使ってきていて最近の PostgreSQL のことには疎いのですが、とある声の大きな方から**「Advent Calendar 書いてね」**と指名でリクエストが来てしまったので、詳しいことはよくわからないのですがとりあえずなにか書くことにしました。

なにをする？

これまで、MySQL 互換の Aurora で、

という感じでバッファキャッシュ／バッファプールの挙動を見てきたので、今回はこれらに続いて、PostgreSQL 互換版でもフェイルオーバー時のバッファ／キャッシュのウォームアップ（サバイバル）について確認してみることにしました。

具体的には

Writer / Reader 構成で Aurora PostgreSQL 9.6.9 互換版のクラスタを作成（インスタンスタイプは db.r4.large）
パラメータグループで共有バッファの容量を意図的に減らしておく（256MB）
1 行当たりの容量が共有バッファよりやや少ない程度のデータをテーブルに 2 行INSERTする
テーブルのデータをEXPLAIN (ANALYZE true, BUFFERS true) SELECTして時間を計測（↑の 2 行を交互に）
フェイルオーバーする
テーブルのデータをEXPLAIN (ANALYZE true, BUFFERS true) SELECTして時間を計測（2 行交互に）
フェイルオーバーする
テーブルのデータをEXPLAIN (ANALYZE true, BUFFERS true) SELECTして時間を計測（2 行交互に）
一旦クラスタを停止し、開始することでキャッシュを消去する
テーブルのデータをEXPLAIN (ANALYZE true, BUFFERS true) SELECTして時間を計測（2 行交互に）

というようなことを行ってみます。

※通常、MySQL ではバッファプールをできるだけ大きくとる（バッファプール部分は OS のディスクキャッシュを利用しない）のに対して、PostgreSQL では共有バッファを全メモリ容量の 1/4 程度に抑えるのが普通で、代わりに OS のディスクキャッシュも利用する形になるようです。そのため、本来であればフェイルオーバー時に OS のディスクキャッシュがサバイブするのかどうかも確かめたいところですが、明確に確かめる方法がないので、所要時間の変化から推測したいと思います。

やってみた

※途中が長いので、すぐに結果を見たい方は**こちら**へどうぞ。

準備

クラスタを作成し、DB パラメータグループのshared_buffersに32768（＝256 MB）を指定してインスタンスを 2 つ（Writer / Reader）立てます。

データベース「sb_test」を作成したら、以下の通りテストデータ（約 200MB × 2 行、TOAST 利用・無圧縮）をINSERTします（INSERT時に共有バッファに対象データが蓄積されるが、id = 2の行をINSERTしたときにid = 1の行の大部分が共有バッファから追い出される）。

データの用意

sb_test=> \timing
Timing is on.
sb_test=> SHOW shared_buffers;
 shared_buffers
----------------
 256MB
(1 row)

Time: 10.530 ms
sb_test=> CREATE TABLE txt (id INT PRIMARY KEY, txt TEXT);
CREATE TABLE
Time: 223.884 ms
sb_test=> ALTER TABLE txt ALTER txt SET STORAGE EXTERNAL;
ALTER TABLE
Time: 5.677 ms
sb_test=> INSERT INTO txt VALUES(1, REPEAT(md5(clock_timestamp()::text), 6000000));
INSERT 0 1
Time: 2862.085 ms
sb_test=> INSERT INTO txt VALUES(2, REPEAT(md5(clock_timestamp()::text), 6000000));
INSERT 0 1
Time: 2743.049 ms

テスト 1 : フェイルオーバー前

まず 1 回目のEXPLAIN ～ SELECTを行います（レコード全体を読み込むためmd5()関数を利用しました）。

EXPLAIN～SELECT(1)

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=688.886..695.673 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=24317
 Planning time: 30.930 ms
 Execution time: 695.699 ms
(5 rows)

Time: 760.997 ms

id = 2の行は共有バッファ内にすべてのデータが存在したので高速でした。

※hitが共有バッファからの読み込み、readがディスク（OS のディスクキャッシュを含む）からの読み込みです。

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=42431.805..42438.657 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=174 read=24398
 Planning time: 0.053 ms
 Execution time: 42438.679 ms
(5 rows)

Time: 42441.572 ms

id = 1の行は大半が共有バッファ外からの読み込みとなりました。

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=16105.989..16112.480 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=16285 read=8032
 Planning time: 0.053 ms
 Execution time: 16112.503 ms
(5 rows)

Time: 16115.345 ms

再度id = 2の行を読み込んだときには 1/3 ほど共有バッファ外からの読み込みとなりました。

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=39201.909..39208.702 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=174 read=24398
 Planning time: 0.051 ms
 Execution time: 39208.726 ms
(5 rows)

Time: 39211.487 ms

id = 1は前回同様、大半が共有バッファ外からの読み込みとなりました。時間は前回よりもわずかに短めです。

テスト 2 : フェイルオーバー 1 回目の後

続いて、クラスタのフェイルオーバーを行います。

フェイルオーバー前の Reader が Writer に昇格しますが、昇格前に旧 Writer で更新したデータが複製されて旧 Reader（新 Writer）の共有バッファ／キャッシュに載っているはずです。

EXPLAIN～SELECT(2)

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Succeeded.
Time: 205.167 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=695.151..701.585 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=24332 read=2
 Planning time: 13.671 ms
 Execution time: 701.616 ms
(5 rows)

Time: 759.147 ms

最初に出ているエラーはフェイルオーバーの影響です。

ほぼ全て共有バッファからの読み込みでした。

続きはまとめて…

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=45372.097..45379.304 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=261 read=24311
 Planning time: 0.054 ms
 Execution time: 45379.333 ms
(5 rows)

Time: 45380.077 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=14692.881..14699.761 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=16194 read=8123
 Planning time: 0.074 ms
 Execution time: 14699.783 ms
(5 rows)

Time: 14700.510 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=42239.097..42246.494 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=172 read=24400
 Planning time: 0.053 ms
 Execution time: 42246.516 ms
(5 rows)

Time: 42247.217 ms

ばらつきはありますがフェイルオーバー前（旧 Writer への接続時）と傾向は似ています。

テスト 3 : フェイルオーバー 2 回目の後

またフェイルオーバーします。

EXPLAIN～SELECT(3)

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Succeeded.
Time: 215.207 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=47679.603..47687.290 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=267 read=24067
 Planning time: 24.325 ms
 Execution time: 47687.346 ms
(5 rows)

Time: 47782.782 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=27945.708..27953.298 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=8338 read=16234
 Planning time: 0.053 ms
 Execution time: 27953.328 ms
(5 rows)

Time: 27956.312 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=50843.007..50850.832 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=251 read=24066
 Planning time: 0.053 ms
 Execution time: 50850.853 ms
(5 rows)

Time: 50853.832 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=39518.475..39526.142 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=101 read=24471
 Planning time: 0.054 ms
 Execution time: 39526.162 ms
(5 rows)

Time: 39529.133 ms

共有バッファ外からの読み込みが増えました。

テスト 4 : クラスタ休止→再開後

一旦クラスタを休止→開始し、共有バッファ／キャッシュをクリアします。

EXPLAIN～SELECT(4)

sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Time: 115.240 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=48224.386..48230.907 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=16 read=24318
 Planning time: 23.409 ms
 Execution time: 48230.935 ms
(5 rows)

Time: 48262.759 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=51553.943..51560.753 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=86 read=24486
 Planning time: 0.050 ms
 Execution time: 51560.781 ms
(5 rows)

Time: 51562.276 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 2;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=41591.528..41598.294 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=179 read=24138
 Planning time: 0.050 ms
 Execution time: 41598.312 ms
(5 rows)

Time: 41601.949 ms
sb_test=> EXPLAIN (ANALYZE true, BUFFERS true) SELECT id, md5(txt) FROM txt WHERE id = 1;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using txt_pkey on txt  (cost=0.15..8.17 rows=1 width=36) (actual time=45140.457..45147.448 rows=1 loops=1)
   Index Cond: (id = 1)
   Buffers: shared hit=86 read=24486
 Planning time: 0.050 ms
 Execution time: 45147.468 ms
(5 rows)

Time: 45151.050 ms

共有バッファ外の読み込み行数と所要時間の関係から推測すると、OS のディスクキャッシュが機能しているのかどうか微妙な感じで結果としてはよくわかりませんでした（キャッシュをクリアしたのに、思ったほど時間が掛からなかった）。

結局

フェイルオーバーで旧 Reader が新しい Writer に昇格したとき、それ以前に共有バッファに入っていたデータはそのまま生きのこっていました。

ただ、OS のディスクキャッシュについてはよくわかりませんでした。

また、フェイルオーバーを繰り返すと、共有バッファ内の有効データが削られてしまう傾向がありました。

と、ここまで書いてきてナンですが…

Aurora MySQL 互換版では「バッファプールをダーティーページの管理目的には使っていない」（そもそもダーティーページ管理自体が不要な構造になっているのが Aurora のコンセプト）など、バッファ／キャッシュの扱いが本家とは違います。

なので、PostgreSQL 互換版でも共有バッファやキャッシュの構造自体が本家と異なる可能性があります（よく知らない…）。

…という、「PostgreSQL の素人が無理やり参加した Advent Calendar」らしい、中途半端なオチで終わりたいと思います。

12/9 追記：

別の Advent Calendar の記事として続きを書きました。

Aurora PostgreSQL 互換版（9.6.9）のキャッシュについて追加で確かめてみた

明日、8 日目は sugai さんです。

PostgreSQL9.4を運用している現場がPostgreSQL11を使い始めた話

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up