More than 1 year has passed since last update.

PG16:Support load balancing in libpq

Last updated at 2023-04-03Posted at 2023-04-02

はじめに

にゃーん。趣味でポスグレをやっている者だ。
今回はPostgreSQL 16のlibpqに追加されたロードバランス機能について書いてみました。

概要

項目	内容
タイトル	Support load balancing in libpq
Topic	Clients
Last Modified	2023-03-29
ステータス	commited
commit id	7f5b19817eaf38e70ad1153db4e644ee9456853e
概要	libpqにロードバランス動作を指定する接続文字列が追加された

変更内容

PostgreSQLのクライアントライブラリlibpqの接続文字列として、load_balance_hostsが追加されました。
この接続文字列は現状、以下の2つの値に対応しています。

値	意味
disable	ホスト間のロードバランシングは行われない。ホストは提供された順に試行され、アドレスはDNSまたはhostsファイルから受信した順に試行される。
random	ホストまたはアドレスは、ランダムな順序で試行される。この方法で、複数のPostgreSQLサーバに接続を負荷分散することができる。

デフォルトはdisableです。PostgreSQL 15までの挙動もdisableと同じになります。
今回はこのload_balance_hostsにrandomを指定してロードバランスの挙動を確認してみます。

libpqへの変更が意味するもの

libpqへこの機能が追加されたということは、libpqを用いたクライアントアプリケーションで、その恩恵が受けられることを意味します。
PostgreSQLの標準パッケージで提供されるクライアントアプリケーションでは、libpqが使用されているため、クライアントアプリケーション側でロードバランス機能の修正なしに、ロードバランス機能が使用可能になります。
（もっとも実際に有益なのは、psqlやpgbenchくらいかもしれませんが）

検証

今回の検証は(commit 0070b66fef21e909adb283f7faa7b1978836ad75)時点(2023-04-01)のリポジトリからビルドして検証しています。

検証環境

1つのEC2インスタンス上にポートを分けたデータベースクラスタを3つ(NODE1～NODE3)立てて、それに対して、psqlを使ってランダムに接続できるかを確認します。

データベースクラスタ	ポート番号
NODE1	16010
NODE2	16020
NODE3	16030

psqlによる確認

psqlで接続する場合、明示的に--host, --port, --usernameなどのオプションで指定することが多いとは思いますが、実はデータベース名として接続文字列を渡すことも可能です。
以下はデータベース名として接続文字列を渡して接続する例です。

$ psql 'port=16010 dbname=testdb'
Null display is "(null)".
psql (16devel)
Type "help" for help.

postgres@testdb=#

なので、今回追加されたロードバランス機能も接続文字列として与えることができます。
（そのため、psql側でロードバランス用のオプション等はなくても動作する）

検証環境でしめした3つのノードに対してランダムに接続し、接続先のPostgreSQLパラメータportをSHOWコマンドで表示する例を示します。
指定する接続文字列の内容は以下となります。

キーワード	設定値	意味
host	localhost,localhost,localhost	今回は3ノードとも同一EC2上に作っているので、全て同じlocalhostを指定する。
port	16010,16020,16030	今回は3ノードを別ポートに設定している。 hostの数とカンマリストを合わせる必要がある。
load_balance_hosts	random	これをrandomに指定するとロードバランスする。
dbname	testdb	データベース名もリスト指定可能だが、今回は1つのみ指定。この場合、どのportに接続しても接続先のデータベース名はtestdbになる。

$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"
Null display is "(null)".
 port
-------
 16010
(1 row)

$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"
Null display is "(null)".
 port
-------
 16030
(1 row)

$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"
Null display is "(null)".
 port
-------
 16020
(1 row)

$

データベースクラスタに接続し、そこのポート番号設定を出力することで、きちんとランダムに接続先が選択されていることがわかります。

指定したhost/portのデータベースクラスタが停止している場合

この場合、停止したデータベースクラスタが選択されてもエラーにはなりません。
これは、load_balance_hosts=random指定時の挙動がリストの中からランダムに選択するのではなく、ランダムな順序で接続を試行するからです。
例えばリストの中に停止したhost/portのデータベースクラスタがあり、それが最初に選択された場合（すぐに接続エラーとなる場合）は次のリストによる接続を試行するためです。

port=16020で接続するデータベースクラスタを停止して、先程と同じ接続文字列で接続をしてみます。

$ pg_ctl stop -D /data/pgdata/16-node2/ -l /tmp/pg16-node2.log
waiting for server to shut down.... done
server stopped
$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"Null display is "(null)".
 port
-------
 16030
(1 row)

$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"
Null display is "(null)".
 port
-------
 16030
(1 row)

$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"
Null display is "(null)".
 port
-------
 16010
(1 row)

$ psql 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb' -c "SHOW port"
Null display is "(null)".
 port
-------
 16010
(1 row)

$

（今回の4回の接続試行では、偶然、port=16020 が1回も選択されていない可能性はありますが、特にエラーとしてクライアントに通知されるわけではないので、そこは実はわからなかったりします・・・）

さて、勘の良い人なら気づくかもしれませんが、接続試行時にエラーが返るまで時間がかかるケース(TCP/IPレベルで接続できなくてタイムアウトまで待たされるようなケース)はどうするのか、という問題があります。
なので、load_balance_hostsを指定する場合（その場合にはhostやportに複数リストを指定すると思うので）は、connect_timeoutの設定も合わせて接続文字列に含めたほうが良さそうです。

pgbenchでロードバランスを試す

psqlの検証で説明したように、libpqを使っているアプリケーションであれば、このload_balance_hostsの恩恵を受けられます。そして、pgbenchでもlibpqを使っているので、他のロードバランサを間に入れることなく複数のデータベースサーバに処理を分散させることができるのでは？と思うわけです。

PostgreSQLのpgbench自体のオプションとして、ロードバランス指定のオプションや、複数のホスト指定は明示的に書かれてはいませんが、psqlと同じようにpgbenchでもデータベース名を指定するため、同じように接続文字列を指定したらいけるのでは？ということで試してみます。

まず、port=16010,16020,16030の各データベースクラスタ(データベース名は全てtestdb)をpgbenchの初期化モードで初期化しておきます。

$ pgbench -p 16010 -i -s 10 --unlogged-table -q testdb
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 0.90 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 1.44 s (drop tables 0.03 s, create tables 0.01 s, client-side generate 0.93 s, vacuum 0.16 s, primary keys 0.31 s).
$ pgbench -p 16020 -i -s 10 --unlogged-table -q testdb
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 0.89 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 1.45 s (drop tables 0.03 s, create tables 0.01 s, client-side generate 0.92 s, vacuum 0.18 s, primary keys 0.31 s).
$ pgbench -p 16030 -i -s 10 --unlogged-table -q testdb
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 0.90 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 2.10 s (drop tables 0.03 s, create tables 0.01 s, client-side generate 0.92 s, vacuum 0.16 s, primary keys 0.97 s).
$

この3つのデータベースクラスタに対して1つのpgbenchを実行してランダムに処理を振り分けてみます。
pgbenchの実行オプションは以下とします。

オプション	設定値	補足
-b	tpcb-like	デフォルトトランザクション。 pgbench_historyへの1件の挿入も行われる
-C	(なし)	トランザクション毎に毎回接続を行う。
-c	2	同時接続数を2にする
-t	500	1接続あたり500回トランザクションを実行する。今回は同時接続数=2なので合計1000トランザクションが実行される。
データベース名	'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random dbname=testdb'	ここでload_balance_hostsを指定する。

この設定でpgbenchを実行してみます。

$ pgbench -b tpcb-like -C -c 2 -t 500 'host=localhost,localhost,localhost port=16010,16020,16030 load_balance_hosts=random db
name=testdb'
pgbench (16devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 2
number of threads: 1
maximum number of tries: 1
number of transactions per client: 500
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 7.161 ms
average connection time = 2.285 ms
tps = 279.275158 (including reconnection times)
$

なお、最初にVAACUUMを実行していますが、このVACUUMがどのデータベースクラスタのtestdbに対して実行されたのかはわかりませんｗ
実際に測定する場合には、VACUUM自体は別途各データベースクラスタに対して事前実行し、pgbench実行時には-n(--no-vacuum)をしたほうが良さそうです。

さて、pgbenchの実行が終わったので、各データベースクラスタのpgbench_historyの件数を見てみます。

$ psql -p 16010 -U postgres testdb -c "SELECT COUNT(*) FROM pgbench_history"
Null display is "(null)".
 count
-------
   349
(1 row)

$ psql -p 16020 -U postgres testdb -c "SELECT COUNT(*) FROM pgbench_history"
Null display is "(null)".
 count
-------
   320
(1 row)

$ psql -p 16030 -U postgres testdb -c "SELECT COUNT(*) FROM pgbench_history"
Null display is "(null)".
 count
-------
   331
(1 row)

$

お、なんかそれっぽくロードバランスされているようですね。

おわりに

PostgreSQL 16に入る予定のlibpqのロードバランス機能の確認と、pgbench単体でのロードバランス方法について調べてみました。
別件で調べている、PostgreSQL 16の疑似マルチマスタ構成へのpgbenchでの性能検証時にも、このやり方が使えそうです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up