LoginSignup
3
4

More than 1 year has passed since last update.

Pgpool-II構築から自動フェイルオーバーテストまで

Last updated at Posted at 2023-02-26

1. 概要

理屈はともかくとして、上から順に設定すればとりあえず動くような手順を確立したいという動機でまとめた。細かいチューニングやパラメータの意味にはほぼ触れないため、公式の文書で補うことを推奨する。
この記事に含まれる内容は以下である。

  1. PostgreSQLのインストール・設定
  2. Pgpool-IIのインストール・設定
  3. Pgpool-IIによる自動フェイルオーバーの動作確認
  4. PostgreSQLについて、プライマリ1台、スタンバイ2台のマルチスタンバイ構成の構築
  5. Pgpool-IIについて、Watchdogを使ったPgpool-II自身の冗長化構成の構築

2. 凡例

  • #: rootユーザ
  • $: 一般ユーザ(この記事内では基本postgresユーザ)

3. 環境情報

3.1. ホストOS

仮想化ソフトウェア:Virtualbox
ネットワーク:NATネットワーク

3.2. ゲストOS

全てCentOS 7とする

hostname IP version
Pgpool-II pgpool 10.0.2.6 Pgpool-II 4.4.1
PostgreSQL Primary server1 10.0.2.7 PostgreSQL 15.1
PostgreSQL Standby server2 10.0.2.8 PostgreSQL 15.1

3.3. PostgreSQL

説明
PostgreSQL インストール先 /usr/local/pgsql
$PGHOME /home/postgres/data
PostgreSQL ログ /home/postgres/data/log
PostgreSQL WAL archive /home/postgres/data/archivedir

3.4. Pgpool-II

説明
Pgpool-IIインストール先 /usr/local/pgpool-II
Pgpool-II ログ /var/log/pgpool

4. 事前準備

全てのサーバで実施

/etc/hosts
# 追加
10.0.2.6 pgpool
10.0.2.7 server1
10.0.2.8 server2
/etc/selinux/config
SELINUX=disabled
# systemctl stop firewalld
# systemctl disable firewalld
# shutdown -r now

5. PostgreSQL

5.1. 使うコマンド・依存パッケージインストール

全てのサーバで実施

# yum install -y wget vim
# yum install -y readline-devel zlib-devel zlib gcc

5.2. PostgreSQLインストール

全てのサーバで実施

# wget -P /usr/local/src https://ftp.postgresql.org/pub/source/v15.1/postgresql-15.1.tar.gz --no-check-certificate
# cd /usr/local/src && tar -zxvf postgresql-15.1.tar.gz && cd postgresql-15.1
# ./configure && make && make install

全サーバのrootユーザで実施

~/.bash_profile
# 追加
export LD_LIBRARY_PATH=/usr/local/pgsql/lib:$LD_LIBRARY_PATH
# 変更
export PATH=/usr/local/pgsql/bin:$PATH
# source ~/.bash_profile

5.3. ユーザの追加

server1,server2で実施、ID/PASSWORD共にpostgresとする

# useradd -m -d /home/postgres postgres
# passwd postgres
Changing password for user postgres.
New password: postgres
BAD PASSWORD: The password contains the user name in some form
Retype new password: postgres
passwd: all authentication tokens updated successfully.
# su - postgres

server1,server2のpostgresユーザで実施

~/.bash_profile
# 追加
export PGDATA=/home/postgres/data
export LD_LIBRARY_PATH=/usr/local/pgsql/lib:$LD_LIBRARY_PATH
# 変更
export PATH=/usr/local/pgsql/bin:$PATH
$ source ~/.bash_profile

5.4. パスワード無しでsshできるように設定

server1で実施

$ mkdir ~/.ssh && chmod 700 ~/.ssh
$ cd ~/.ssh
$ ssh-keygen -t rsa -b 4096 -f id_rsa_pgpool
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2

server2で実施

$ mkdir ~/.ssh && chmod 700 ~/.ssh
$ cd ~/.ssh
$ ssh-keygen -t rsa -b 4096 -f id_rsa_pgpool
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2

5.4.1. パスワードなしでsshできるか確認

server1で実施

$ ssh postgres@server1 -i ~/.ssh/id_rsa_pgpool
$ exit
$ ssh postgres@server2 -i ~/.ssh/id_rsa_pgpool
$ exit

server2で実施

$ ssh postgres@server1 -i ~/.ssh/id_rsa_pgpool
$ exit
$ ssh postgres@server2 -i ~/.ssh/id_rsa_pgpool
$ exit

5.5. PostgreSQL初期設定

server1で実施

$ initdb -D ${PGDATA} --no-locale --encoding=UTF8
$ mkdir ${PGDATA}/archivedir

5.5.1. ネットワーク設定

ホストOSからは認証有、同一セグメントの場合はパスワード無しで接続可とする
※お試し用なので適当に設定している、構築して動くようになったらちゃんとした設定にする必要がある

${PGDATA}/pg_hba.conf
host    all             all             192.168.11.41/32        scram-sha-256
host    all             all             samenet                 trust
host    replication     all             samenet                 trust

5.5.2. PostgreSQL設定

${PGDATA}/postgresql.conf
# コメントを外して適宜変更
listen_addresses = '*'
dynamic_shared_memory_type = sysv
wal_level = replica
synchronous_commit = on
archive_mode = on
archive_command = 'cp "%p" "/home/postgres/data/archivedir/%f"'
max_wal_senders = 10
hot_standby = on
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_rotation_age = 1d
log_rotation_size = 10MB
log_truncate_on_rotation = on

5.6. 起動・ユーザの追加

server1で実施

$ pg_ctl -D ${PGDATA} start
$ psql -c "ALTER ROLE postgres WITH PASSWORD 'postgres';"
$ psql -c "CREATE ROLE pgpool WITH PASSWORD 'pgpool' LOGIN SUPERUSER;"
$ psql -c "CREATE ROLE repl WITH PASSWORD 'repl' LOGIN REPLICATION;"
$ psql -c "GRANT pg_monitor TO pgpool;"
$ psql
psql (15.1)
Type "help" for help.

postgres=# SELECT * FROM pg_user;
 usename  | usesysid | usecreatedb | usesuper | userepl | usebypassrls |  passwd  | valuntil | useconfig
----------+----------+-------------+----------+---------+--------------+----------+----------+-----------
 postgres |       10 | t           | t        | t       | t            | ******** |          |
 pgpool   |    16387 | f           | f        | f       | f            | ******** |          |
 repl     |    24581 | f           | f        | t       | f            | ******** |          |

postgres=# \q

5.7. pgpool_recoveryインストール

Pgpool-IIを使ってオンラインリカバリを実行するために必要
server1,server2で実施

$ exit
# wget -O /usr/local/src/pgpool-II-4.4.1.tar.gz "https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.4.1.tar.gz" --no-check-certificate
# cd /usr/local/src && tar -zxvf pgpool-II-4.4.1.tar.gz
# cd /usr/local/src/pgpool-II-4.4.1/src/sql/pgpool-recovery
# make && make install

server1で実施

# psql -f pgpool-recovery.sql template1 -U postgres
# psql -U postgres
psql (15.1)
Type "help" for help.

postgres=# \df
                                                                                          List of functions
 Schema |        Name         | Result data type |                                                             Argument data types                                                             | Type
--------+---------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------+------
 public | pgpool_pgctl        | boolean          | text, text                                                                                                                                  | func
 public | pgpool_recovery     | boolean          | script_name text, remote_host text, remote_data_directory text                                                                              | func
 public | pgpool_recovery     | boolean          | script_name text, remote_host text, remote_data_directory text, primary_port text                                                           | func
 public | pgpool_recovery     | boolean          | script_name text, remote_host text, remote_data_directory text, primary_port text, remote_node integer                                      | func
 public | pgpool_recovery     | boolean          | script_name text, remote_host text, remote_data_directory text, primary_port text, remote_node integer, remote_port text                    | func
 public | pgpool_recovery     | boolean          | script_name text, remote_host text, remote_data_directory text, primary_port text, remote_node integer, remote_port text, primary_host text | func
 public | pgpool_remote_start | boolean          | text, text                                                                                                                                  | func
 public | pgpool_switch_xlog  | text             | text                                                                                                                                        | func
(8 rows)

postgres=# \q
# su - postgres

6. Pgpool-II

6.1. 使うコマンド・依存パッケージインストール

# yum install -y wget vim git
# yum install -y readline-devel zlib-devel zlib gcc

6.2. Pgpool-IIインストール

# wget -O /usr/local/src/pgpool-II-4.4.1.tar.gz "https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.4.1.tar.gz" --no-check-certificate
# cd /usr/local/src && tar -zxvf pgpool-II-4.4.1.tar.gz
# cd /usr/local/src/pgpool-II-4.4.1
# ./configure --prefix=/usr/local/pgpool-II && make && make install
~/.bash_profile
# 変更
export LD_LIBRARY_PATH=/usr/local/pgsql/lib:/usr/local/pgpool-II/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/pgsql/bin:/usr/local/pgpool-II/bin:$PATH
# source ~/.bash_profile

6.3. サービス起動設定

/usr/lib/systemd/system/pgpool.service
[Unit]
Description=Pgpool-II
After=syslog.target network.target

[Service]

User=root
Group=root

EnvironmentFile=-/etc/sysconfig/pgpool

ExecStart=/usr/local/pgpool-II/bin/pgpool -f /usr/local/pgpool-II/etc/pgpool.conf $OPTS
ExecStop=/usr/local/pgpool-II/bin/pgpool -f /usr/local/pgpool-II/etc/pgpool.conf $STOP_OPTS stop
ExecReload=/usr/local/pgpool-II/bin/pgpool -f /usr/local/pgpool-II/etc/pgpool.conf reload

[Install]
WantedBy=multi-user.target

-Dオプションはpgpool_statusをファイルを破棄し、以前の状態を復元しないようにする
pgpool_statusがあると所謂「キャッシュ」されたような挙動をする。キャッシュの対象は以前のPostgreSQLの状態となる
これが居ると直感的でない挙動をすることがあるため(生きてるPostgreSQLを認識しないなど)無視するようにする

/etc/sysconfig/pgpool
# Options for pgpool

# -n: don't run in daemon mode. does not detach control tty
# -d: debug mode. lots of debug information will be printed

#OPTS=" -d -n"
OPTS=" -n -D -F /usr/local/pgpool-II/etc/pcp.conf"

STOP_OPTS=" -m fast"
# chmod +x /etc/sysconfig/pgpool

6.4. pcpのパスワード設定

Pgpool-IIではオンラインリカバリするときやノードの情報を確認するときにpcpという独自のプロトコルを使用する
ユーザを指定して実行しパスワードを要求される
ユーザとパスワードのペアは以下のように作成する
ここで作成したユーザとパスワードのペアはPostgreSQLとは無関係なので注意

# echo 'pgpool:'`pg_md5 pgpool` > /usr/local/pgpool-II/etc/pcp.conf 

6.5. pcp実行時パスワードを聞かれないようにする

~/.pcppassを作ることでpcpのパスワードをスキップできる
パスワードのスキップにはコマンド実行時-wオプションを付ける必要がある
ファイルの場所は環境変数 PCPPASSFILEに設定することで場所を変更できる

# echo -e "#hostname:port:username:password\nlocalhost:9898:pgpool:pgpool" > ~/.pcppass
# chmod 600 ~/.pcppass

6.6. 設定

# mkdir /var/run/pgpool

/var/runの下は再起動時消えるので消えないよう設定

/etc/tmpfiles.d/pgpool.conf
#Type   Path                    Mode    UID     GID   Age  Argument
d       /var/run/pgpool         0755    root    root  -
# cp /usr/local/pgpool-II/etc/pgpool.conf.sample /usr/local/pgpool-II/etc/pgpool.conf
/usr/local/pgpool-II/etc/pgpool.conf
# コメントを外して適宜変更
#------------------------------------------------------------------------------
# CONNECTIONS
#------------------------------------------------------------------------------
listen_addresses = '*'
port = 9999
pcp_listen_addresses = 'localhost'
pcp_port = 9898
pcp_socket_dir = '/tmp'
# primary設定
backend_hostname0 = '10.0.2.7'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = '10.0.2.7'
# standby設定
backend_hostname1 = '10.0.2.8'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = '10.0.2.8'
#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/pgpool'
log_filename = 'pgpool-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 10MB
#------------------------------------------------------------------------------
# STREAMING REPLICATION MODE
#------------------------------------------------------------------------------
sr_check_period = 10
sr_check_user = 'pgpool'
sr_check_database = 'postgres'
delay_threshold = 0
delay_threshold_by_time = 5
prefer_lower_delay_standby = on
follow_primary_command = '/usr/local/pgpool-II/etc/follow_primary.sh %d %h %p %D %m %H %M %P %r %R'
#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS
#------------------------------------------------------------------------------
health_check_period = 10
health_check_timeout = 20
health_check_user = 'pgpool'
health_check_database = 'postgres'
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = '/usr/local/pgpool-II/etc/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
failover_on_backend_error = on
failover_on_backend_shutdown = on
detach_false_primary = on
search_primary_node_timeout = 5min
#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------
recovery_user = 'pgpool'
recovery_1st_stage_command = 'recovery_1st_stage'
recovery_timeout = 90
client_idle_limit_in_recovery = 10
auto_failback = on
auto_failback_interval = 1min

6.7. パスワード無しでsshできるように設定

pgpoolで実施

# mkdir ~/.ssh && chmod 700 ~/.ssh
# cd ~/.ssh
# ssh-keygen -t rsa -b 4096 -f id_rsa_pgpool
# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2

6.7.1. パスワードなしでsshできるか確認

# ssh postgres@server1 -i ~/.ssh/id_rsa_pgpool
$ exit
Connection to server1 closed.
# ssh postgres@server2 -i ~/.ssh/id_rsa_pgpool
$ exit
Connection to server2 closed.

6.8. 自動フェイルオーバーに使用するスクリプトの準備

# cp /usr/local/pgpool-II/etc/follow_primary.sh.sample /usr/local/pgpool-II/etc/follow_primary.sh
# cp /usr/local/pgpool-II/etc/failover.sh.sample /usr/local/pgpool-II/etc/failover.sh
# cp /usr/local/pgpool-II/etc/recovery_1st_stage.sample /usr/local/pgpool-II/etc/recovery_1st_stage
# cp /usr/local/pgpool-II/etc/pgpool_remote_start.sample /usr/local/pgpool-II/etc/pgpool_remote_start
# chmod +x /usr/local/pgpool-II/etc/{follow_primary.sh,failover.sh,recovery_1st_stage,pgpool_remote_start}
/usr/local/pgpool-II/etc/follow_primary.sh
# 変更
PGHOME=/usr/local/pgsql
ARCHIVEDIR=/home/postgres/data/archivedir
PGPOOL_PATH=/usr/local/pgpool-II/bin
/usr/local/pgpool-II/etc/failover.sh
# 変更
PGHOME=/usr/local/pgsql
/usr/local/pgpool-II/etc/recovery_1st_stage
# 変更
PGHOME=/usr/local/pgsql
ARCHIVEDIR=/home/postgres/data/archivedir
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/postgres/.ssh/${SSH_KEY_FILE}"
/usr/local/pgpool-II/etc/pgpool_remote_start
# 変更
PGHOME=/usr/local/pgsql
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/postgres/.ssh/${SSH_KEY_FILE}"

6.9. スクリプトの配置

これらのスクリプトはPostgreSQL側で動かす必要があるためscpで配置

# scp -p -i ~/.ssh/id_rsa_pgpool /usr/local/pgpool-II/etc/recovery_1st_stage postgres@server1:/home/postgres/data
# scp -p -i ~/.ssh/id_rsa_pgpool /usr/local/pgpool-II/etc/pgpool_remote_start postgres@server1:/home/postgres/data

6.10. 起動・接続確認

# systemctl start pgpool
# psql -p 9999 -U postgres
psql (15.1)
Type "help" for help.

postgres=# SELECT * FROM pg_user;
 usename  | usesysid | usecreatedb | usesuper | userepl | usebypassrls |  passwd  | valuntil | useconfig
----------+----------+-------------+----------+---------+--------------+----------+----------+-----------
 postgres |       10 | t           | t        | t       | t            | ******** |          |
 pgpool   |    16387 | f           | f        | f       | f            | ******** |          |
 repl     |    24581 | f           | f        | t       | f            | ******** |          |
(3 rows)

postgres=# \q

6.11. Pgpool-IIがノード(PosgreSQL)を認識しているか確認

# pcp_node_info -v -h localhost -U pgpool -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 2
Weight                 : 0.500000
Status Name            : up
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-03-02 22:18:07

Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 3
Weight                 : 0.500000
Status Name            : down
Backend Status Name    : down
Role                   : standby
Backend Role           : unknown
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-03-02 22:18:25

# psql -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | up     | up        | 0.500000  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-02 22:18:07
 1       | 10.0.2.8 | 5432 | down   | down      | 0.500000  | standby | unknown | 0          | false             | 0.000000 second   |                   |                        | 2023-03-02 22:18:25
(2 rows)

6.12. standbyを起動

pcp_recovery_nodeコマンドを用いstandbyを起動する

# pcp_recovery_node -h localhost -p 9898 -U pgpool -n 1 -w
pcp_recovery_node -- Command Successful

6.12.1. standbyの起動確認

# pcp_node_info -v -h localhost -U pgpool -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 2
Weight                 : 0.500000
Status Name            : up
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-03-02 22:18:07

Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 1
Weight                 : 0.500000
Status Name            : waiting
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.588687 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-02 22:20:41
# psql -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | up     | up        | 0.500000  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-02 22:18:07
 1       | 10.0.2.8 | 5432 | up     | up        | 0.500000  | standby | standby | 0          | false             | 0.000000 second   | streaming         | async                  | 2023-03-02 22:21:13
(2 rows)

7. Pgpool-IIによる自動フェイルオーバーのテスト

7.1. ケース1

primaryであるserver1を落とし、standbyであるserver2がprimaryに昇格することを確認する

※事前にpgpoolサーバに入り、tail -f /var/log/pgpool/pgpool-yyyy-MM-dd_hhmmss.logなどでログをリアルタイムで見られるようにしておくことを推奨

7.1.1. server1を落とす

server1で実施

$ pg_ctl -D ${PGDATA} stop

7.1.2. ノード状態の確認

pgpoolで実施
無事にserver2(10.0.2.8)がprimaryになったことが確認できる

# pcp_node_info -v -h localhost -U pgpool -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 3
Weight                 : 0.500000
Status Name            : down
Backend Status Name    : down
Role                   : standby
Backend Role           : unknown
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-02-26 11:36:33

Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 2
Weight                 : 0.500000
Status Name            : up
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-02-26 11:12:33

# psql -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | down   | down      | 0.500000  | standby | unknown | 0          | false             | 0.000000 second   |                   |                        | 2023-02-26 11:36:33
 1       | 10.0.2.8 | 5432 | up     | up        | 0.500000  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-02-26 11:12:33
(2 rows)

7.1.3. server1(10.0.2.7)の復旧

pgpoolで実施
-nにはnode_idを指定する
落としたのはserver1でありnode_id=0のため0を指定

# pcp_recovery_node -h localhost -p 9898 -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful

7.1.4. ノード状態の確認

pgpoolで実施
落としたserver1がstandbyとして復旧できている

# pcp_node_info -v -h localhost -U pgpool -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 2
Weight                 : 0.500000
Status Name            : up
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-02-26 11:43:42

Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 2
Weight                 : 0.500000
Status Name            : up
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-02-26 11:12:33

# psql -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | up     | up        | 0.500000  | standby | standby | 0          | true              | 0.000000 second   |                   |                        | 2023-02-26 11:44:50
 1       | 10.0.2.8 | 5432 | up     | up        | 0.500000  | primary | primary | 0          | false             | 0                 |                   |                        | 2023-02-26 11:12:33
(2 rows)

7.2. ケース2

primaryであるserver2を落とし、standbyであるserver1がprimaryに昇格することを確認する

7.2.1. server2を落とす

server2で実施

$ pg_ctl -D ${PGDATA} stop

7.2.2. ノード状態の確認

pgpoolで実施
無事にserver1(10.0.2.7)がprimaryになったことが確認できる

# pcp_node_info -v -h localhost -U pgpool -w
~省略~
# psql -p 9999 -U postgres -c "show pool_nodes;"
~省略~

7.2.3. server2(10.0.2.8)の復旧

pgpoolで実施
-nにはnode_idを指定する
落としたのはserver2でありnode_id=1のため1を指定

# pcp_recovery_node -h localhost -p 9898 -U pgpool -n 1 -w
Password:pgpool
pcp_recovery_node -- Command Successful

7.2.4. ノード状態の確認

pgpoolで実施
落としたserver1がstandbyとして復旧できている

# pcp_node_info -v -h localhost -U pgpool -w
~省略~
# psql -p 9999 -U postgres -c "show pool_nodes;"
~省略~

7.3. ケース3

primaryであるserver1を落とし、standbyであるserver2がprimaryに昇格後、テーブル作成を行ないserver1とserver2で差分を作ってからオンラインリカバリを実行し問題なくリカバリできることを確認

7.3.1. server1を落とす

server1で実施

$ pg_ctl -D ${PGDATA} stop

7.3.2. ノード状態の確認

pgpoolで実施
無事にserver2(10.0.2.8)がprimaryになったことが確認できる

# pcp_node_info -v -h localhost -U pgpool -w
~省略~
# psql -p 9999 -U postgres -c "show pool_nodes;"
~省略~

7.3.3. 差分を作るためにテーブル作成

pgpoolで実施

# psql -p 9999 -U postgres
postgres=# CREATE TABLE test(id INTEGER, name TEXT);
postgres=# INSERT INTO test(id, name) VALUES (1, 'taro');
postgres=# SELECT * FROM test;
 id | name
----+------
  1 | taro
(1 row)

postgres=# \q

7.3.4. server1(10.0.2.7)の復旧

pgpoolで実施

# pcp_recovery_node -h localhost -p 9898 -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful

7.3.5. ノード状態の確認

pgpoolで実施
落としたserver1がstandbyとして復旧できている

# pcp_node_info -v -h localhost -U pgpool -w
~省略~
# psql -p 9999 -U postgres -c "show pool_nodes;"
~省略~

7.3.6. server1でデータの確認

server1で実施

$ psql
postgres=# SELECT * FROM test;
 id | name
----+------
  1 | taro
(1 row)

postgres=# \q

8. Appendix

8.1. PostgreSQLのstandbyをもう1台追加しマルチスタンバイにする

6.12.1. standbyの起動確認
まで終了していることとする

8.1.1. 環境情報

hostname IP version
PostgreSQL Standby2 server3 10.0.2.5 PostgreSQL 15.1

8.1.2. PostgreSQLインストール・設定

pgpool,server1,server2で実施

/etc/hosts
# 追加
10.0.2.5 server3

server3で実施

/etc/hosts
# 追加
10.0.2.6 pgpool
10.0.2.7 server1
10.0.2.8 server2
10.0.2.5 server3
# yum install -y wget vim
# yum install -y readline-devel zlib-devel zlib gcc
# wget -P /usr/local/src https://ftp.postgresql.org/pub/source/v15.1/postgresql-15.1.tar.gz --no-check-certificate
# cd /usr/local/src && tar -zxvf postgresql-15.1.tar.gz && cd postgresql-15.1
# ./configure && make && make install
# wget -O /usr/local/src/pgpool-II-4.4.1.tar.gz "https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.4.1.tar.gz" --no-check-certificate
# cd /usr/local/src && tar -zxvf pgpool-II-4.4.1.tar.gz
# cd /usr/local/src/pgpool-II-4.4.1/src/sql/pgpool-recovery
# make && make install
# useradd -m -d /home/postgres postgres
# passwd postgres
Changing password for user postgres.
New password: postgres
BAD PASSWORD: The password contains the user name in some form
Retype new password: postgres
passwd: all authentication tokens updated successfully.
# su - postgres
~/.bash_profile
# 追加
export PGDATA=/home/postgres/data
export LD_LIBRARY_PATH=/usr/local/pgsql/lib:$LD_LIBRARY_PATH
# 変更
export PATH=/usr/local/pgsql/bin:$PATH
$ source ~/.bash_profile
$ mkdir ~/.ssh && chmod 700 ~/.ssh
$ cd ~/.ssh
$ ssh-keygen -t rsa -b 4096 -f id_rsa_pgpool
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3

server1で実施

# su - postgres
$ cd ~/.ssh
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3

server2で実施

# su - postgres
$ cd ~/.ssh
$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3

pgpoolで実施

# cd ~/.ssh
# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3

8.1.3. Pgpool-II設定

pgpoolで実施

/usr/local/pgpool-II/etc/pgpool.conf
backend_hostname2 = '10.0.2.5'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/home/postgres/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name2 = '10.0.2.5'
# systemctl restart pgpool
# psql -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-02 22:23:49
 1       | 10.0.2.8 | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0.000000 second   | streaming         | async                  | 2023-03-02 22:23:49
 2       | 10.0.2.5 | 5432 | down   | down      | 0.333333  | standby | unknown | 0          | false             | 0                 |                   |                        | 2023-03-02 22:23:45
(3 rows)
# pcp_recovery_node -h localhost -p 9898 -U pgpool -n 2 -w
pcp_recovery_node -- Command Successful
# psql -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | up     | up        | 0.333333  | primary | primary | 0          | false             | 0                 |                   |                        | 2023-03-02 22:23:49
 1       | 10.0.2.8 | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0.000000 second   | streaming         | async                  | 2023-03-02 22:23:49
 2       | 10.0.2.5 | 5432 | up     | up        | 0.333333  | standby | standby | 0          | true              | 0.000000 second   | streaming         | async                  | 2023-03-02 22:26:03
(3 rows)

# psql -p 9999 -U postgres -c "SELECT * FROM pg_stat_replication;"
 pid  | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time
------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
 2074 |    16385 | repl    | 10.0.2.8         | 10.0.2.8    |                 |       35632 | 2023-03-02 22:20:41.629929+09 |              | streaming | 0/10000060 | 0/10000060 | 0/10000060 | 0/10000060 |           |           |            |             0 | async      | 2023-03-02 22:26:33.830319+09
 2279 |    16385 | repl    | 10.0.2.5         | 10.0.2.5    |                 |       37692 | 2023-03-02 22:25:11.721256+09 |              | streaming | 0/10000060 | 0/10000060 | 0/10000060 | 0/10000060 |           |           |            |             0 | async      | 2023-03-02 22:26:31.869239+09
(2 rows)

8.2. Pgpool-IIをもう1台追加しWatchdogで冗長化する

8.1.3. Pgpool-II設定
まで終了していることとする

補足

本来Watchdogは3台以上の奇数台で構成することが推奨されている。backendのPostgreSQLが生きているか否かをPgpool-IIは監視するが、たまたまネットワークエラーで繋がらなかったケースをPostgreSQLがダウンしたと扱わないためにPgpool-II同士で「投票」を行なう。全てのサーバの投票が過半数より多い場合(例:Pgpool-IIが5台で動いていれば、3台がダウンに投票した場合)、そのPostgreSQLはダウンしたと見なす。偶数台の場合は完全に割れた場合(1:1や2:2)、すべてのサーバが投票しても過半数にならない。
この現象は設定値で回避することができ、投票数が半数の場合でもダウンとみなすことができる。本記事では簡単のため1台のみの追加とし、この設定を有効化する。奇数台の場合は不要となる。

8.2.1. 環境情報

hostname IP version
Pgpool-II Standby pgpool2 10.0.2.9 PostgreSQL 15.1

VIP:10.0.2.100

補足

VIP(仮想IP)はそれぞれのサーバが持っているIPとは別に、現在・未来において使わないIPアドレスを使う。使わないものかつ同一サブネットなら何でもよい。
Watchdogで冗長化した場合、Pgpool-IIへの接続は、それぞれのIPではなくVIPに対して行なう。VIPはプライマリサーバのネットワークインターフェースに2つ目のIPアドレスとしてアタッチされる。プライマリサーバのPgpool-IIがなんらかの理由でダウンした場合、スタンバイサーバのPgpool-IIはVIPを自身にアタッチする。これにより、Pgpool-IIが1台落ちても同じIPアドレスでPgpool-IIおよびPostgreSQLへアクセスすることができる。

8.2.2. Pgpool-IIインストール・設定

pgpool,server1,server2,server3で実施

/etc/hosts
# 追加
10.0.2.9 pgpool2

pgpool2で実施

/etc/hosts
# 追加
10.0.2.6 pgpool
10.0.2.9 pgpool2
10.0.2.7 server1
10.0.2.8 server2
10.0.2.5 server3
# yum install -y wget vim
# yum install -y readline-devel zlib-devel zlib gcc
# wget -P /usr/local/src https://ftp.postgresql.org/pub/source/v15.1/postgresql-15.1.tar.gz --no-check-certificate
# cd /usr/local/src && tar -zxvf postgresql-15.1.tar.gz && cd postgresql-15.1
# ./configure && make && make install
# wget -O /usr/local/src/pgpool-II-4.4.1.tar.gz "https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.4.1.tar.gz" --no-check-certificate
# cd /usr/local/src && tar -zxvf pgpool-II-4.4.1.tar.gz
# cd /usr/local/src/pgpool-II-4.4.1
# ./configure --prefix=/usr/local/pgpool-II && make && make install
~/.bash_profile
# 変更
export LD_LIBRARY_PATH=/usr/local/pgsql/lib:/usr/local/pgpool-II/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/pgsql/bin:/usr/local/pgpool-II/bin:$PATH
# source ~/.bash_profile
/usr/lib/systemd/system/pgpool.service
[Unit]
Description=Pgpool-II
After=syslog.target network.target

[Service]

User=root
Group=root

EnvironmentFile=-/etc/sysconfig/pgpool

ExecStart=/usr/local/pgpool-II/bin/pgpool -f /usr/local/pgpool-II/etc/pgpool.conf $OPTS
ExecStop=/usr/local/pgpool-II/bin/pgpool -f /usr/local/pgpool-II/etc/pgpool.conf $STOP_OPTS stop
ExecReload=/usr/local/pgpool-II/bin/pgpool -f /usr/local/pgpool-II/etc/pgpool.conf reload

[Install]
WantedBy=multi-user.target
/etc/sysconfig/pgpool
# Options for pgpool

# -n: don't run in daemon mode. does not detach control tty
# -d: debug mode. lots of debug information will be printed

#OPTS=" -d -n"
OPTS=" -n -D -F /usr/local/pgpool-II/etc/pcp.conf"

STOP_OPTS=" -m fast"
# chmod +x /etc/sysconfig/pgpool
# echo -e "#hostname:port:username:password\nlocalhost:9898:pgpool:pgpool" > ~/.pcppass
# chmod 600 ~/.pcppass
# mkdir /var/run/pgpool
/etc/tmpfiles.d/pgpool.conf
#Type   Path                    Mode    UID     GID   Age  Argument
d       /var/run/pgpool         0755    root    root  -

8.2.3. Watchdogの設定

pgpoolで実施

ネットワークインターフェース名を確認しておく、今回はenp0s3

# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.2.6/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
       valid_lft 505sec preferred_lft 505sec

enp0s3は環境に応じて適切なものに変更すること

/usr/local/pgpool-II/etc/pgpool.conf
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
use_watchdog = on
hostname0 = '10.0.2.6'
wd_port0 = 9000
pgpool_port0 = 9999
hostname1 = '10.0.2.9'
wd_port1 = 9000
pgpool_port1 = 9999
# - Virtual IP control Setting -
delegate_ip = '10.0.2.100'
if_up_cmd = '/sbin/ip addr add $_IP_$/24 dev enp0s3 label enp0s3:0'
if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev enp0s3'
arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I enp0s3'
# - Behaivor on escalation Setting -
wd_escalation_command = '/usr/local/pgpool-II/etc/escalation.sh'
# - Watchdog consensus settings for failover -
# 偶数台でも動くようにするための設定,奇数台の場合はoffもしくはコメントアウトのままでよい
enable_consensus_with_half_votes = on
# - Lifecheck Setting -
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
heartbeat_hostname0 = '10.0.2.6'
heartbeat_port0 = 9694
heartbeat_device0 = ''
heartbeat_hostname1 = '10.0.2.9'
heartbeat_port1 = 9694
heartbeat_device1 = ''
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
# cp -p /usr/local/pgpool-II/etc/escalation.sh.sample /usr/local/pgpool-II/etc/escalation.sh
# chmod +x /usr/local/pgpool-II/etc/escalation.sh
/usr/local/pgpool-II/etc/escalation.sh
POSTGRESQL_STARTUP_USER=root
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /root/.ssh/${SSH_KEY_FILE}"
PGPOOLS=(10.0.2.6 10.0.2.9)
VIP=10.0.2.100
DEVICE=enp0s3
        /sbin/ip addr del $VIP/24 dev $DEVICE

pgpool2で実施

# mkdir ~/.ssh && chmod 700 ~/.ssh
# cd ~/.ssh
# ssh-keygen -t rsa -b 4096 -f id_rsa_pgpool
# ssh-copy-id -i id_rsa_pgpool.pub root@pgpool
# ssh-copy-id -i id_rsa_pgpool.pub root@pgpool2
# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2
# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3
# ssh root@pgpool -i ~/.ssh/id_rsa_pgpool
# ssh root@pgpool2 -i ~/.ssh/id_rsa_pgpool
# ssh postgres@server1 -i ~/.ssh/id_rsa_pgpool
# ssh postgres@server2 -i ~/.ssh/id_rsa_pgpool
# ssh postgres@server3 -i ~/.ssh/id_rsa_pgpool

pgpoolで実施、ディレクトリごとscp

# scp -r -p -i ~/.ssh/id_rsa_pgpool /usr/local/pgpool-II root@pgpool2:/usr/local

pgpoolで実施

# echo "0" > /usr/local/pgpool-II/etc/pgpool_node_id

pgpool2で実施

# echo "1" > /usr/local/pgpool-II/etc/pgpool_node_id

8.2.4. 起動

pgpool,pgpool2で実施

# systemctl start pgpool

8.2.5. Pgpool-II冗長化できているか確認

pgpoolで実施

# ip -4 -brief a s dev enp0s3
enp0s3           UP             10.0.2.6/24 10.0.2.100/24
# pcp_watchdog_info -h localhost -U pgpool -w
2 2 YES 10.0.2.6:9999 Linux pgpool 10.0.2.6

10.0.2.6:9999 Linux pgpool 10.0.2.6 9999 9000 4 LEADER 0 MEMBER
10.0.2.9:9999 Linux pgpool2 10.0.2.9 9999 9000 7 STANDBY 0 MEMBER

8.2.6. 自動フェイルオーバーの確認

pgpoolでtail -f /var/log/pgpool/pgpool-yyyy-MM-dd_hhmmss.logなどでログを表示しておく
server1(primary)で実施

# su - postgres
$ pg_ctl -D ${PGDATA} stop

pgpoolでログの確認

/var/log/pgpool/pgpool-yyyy-MM-dd_hhmmss.log
2023-03-04 17:39:33.404: health_check0 pid 1302: LOG:  received degenerate backend request for node_id: 0 from pid [1302]
2023-03-04 17:39:33.404: watchdog pid 1257: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
2023-03-04 17:39:33.404: watchdog pid 1257: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2023-03-04 17:39:33.404: watchdog pid 1257: LOG:  we have got the consensus to perform the failover
2023-03-04 17:39:33.404: watchdog pid 1257: DETAIL:  1 node(s) voted in the favor
2023-03-04 17:39:33.409: health_check0 pid 1302: LOG:  signal_user1_to_parent_with_reason(0)
2023-03-04 17:39:33.409: main pid 1254: LOG:  Pgpool-II parent process received SIGUSR1
2023-03-04 17:39:33.409: main pid 1254: LOG:  Pgpool-II parent process has received failover request
2023-03-04 17:39:33.410: watchdog pid 1257: LOG:  received the failover indication from Pgpool-II on IPC interface
2023-03-04 17:39:33.410: watchdog pid 1257: LOG:  watchdog is informed of failover start by the main process
2023-03-04 17:39:33.418: main pid 1254: LOG:  === Starting degeneration. shutdown host 10.0.2.7(5432) ===
2023-03-04 17:39:33.463: main pid 1254: LOG:  Restart all children
2023-03-04 17:39:33.464: main pid 1254: LOG:  execute command: /usr/local/pgpool-II/etc/failover.sh  0 10.0.2.7  5432 /home/postgres/data  1 10.0.2.8 0 0 5432 /home/postgres/data 10.0.2.7 5432
~略~
2023-03-04 17:39:54.683: main pid 1254: LOG:  === Failback done. reconnect host 10.0.2.5(5432) ===

pgpoolで実施

# pcp_node_info -h localhost -U pgpool -n 0 -v -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 3
Weight                 : 0.333333
Status Name            : down
Backend Status Name    : down
Role                   : standby
Backend Role           : unknown
Replication Delay      : 0
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-03-04 17:39:37

# pcp_node_info -h localhost -U pgpool -n 1 -v -w
Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0.000000 second
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-03-04 17:39:37

8.2.7. 落としたPostgreSQLを復帰

# pcp_recovery_node -h localhost -U pgpool -n 0 -w
pcp_recovery_node -- Command Successful
# pcp_node_info -h localhost -U pgpool -n 0 -v -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-04 17:43:12

8.2.8. Pgpool-IIのプライマリを落とす

pgpool2でtail -f /var/log/pgpool/pgpool-yyyy-MM-dd_hhmmss.logなどでログを表示しておく
pgpoolで実施

# systemctl stop pgpool
/var/log/pgpool/pgpool-yyyy-MM-dd_hhmmss.log
2023-03-04 18:06:34.742: heart_beat_receiver pid 1903: LOG:  set SO_REUSEPORT option to the socket
2023-03-04 18:06:34.742: heart_beat_receiver pid 1903: LOG:  creating watchdog heartbeat receive socket.
2023-03-04 18:06:34.742: heart_beat_receiver pid 1903: DETAIL:  set SO_REUSEPORT
2023-03-04 18:06:34.742: heart_beat_sender pid 1904: LOG:  set SO_REUSEPORT option to the socket
2023-03-04 18:06:34.742: heart_beat_sender pid 1904: LOG:  creating socket for sending heartbeat
2023-03-04 18:06:34.742: heart_beat_sender pid 1904: DETAIL:  set SO_REUSEPORT
2023-03-04 18:06:43.921: sr_check_worker pid 1938: LOG:  verify_backend_node_status: primary 1 owns only 1 standbys out of 2
2023-03-04 18:06:48.687: watchdog pid 1901: LOG:  remote node "10.0.2.6:9999 Linux pgpool" is shutting down
2023-03-04 18:06:48.687: watchdog pid 1901: LOG:  watchdog cluster has lost the coordinator node
2023-03-04 18:06:48.687: watchdog pid 1901: LOG:  removing the remote node "10.0.2.6:9999 Linux pgpool" from watchdog cluster leader
2023-03-04 18:06:48.689: watchdog pid 1901: LOG:  We have lost the cluster leader node "10.0.2.6:9999 Linux pgpool"
2023-03-04 18:06:48.689: watchdog pid 1901: LOG:  watchdog node state changed from [STANDBY] to [JOINING]
2023-03-04 18:06:52.697: watchdog pid 1901: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2023-03-04 18:06:53.699: watchdog pid 1901: LOG:  I am the only alive node in the watchdog cluster
2023-03-04 18:06:53.699: watchdog pid 1901: HINT:  skipping stand for coordinator state
2023-03-04 18:06:53.699: watchdog pid 1901: LOG:  watchdog node state changed from [INITIALIZING] to [LEADER]
2023-03-04 18:06:53.699: watchdog pid 1901: LOG:  Setting failover command timeout to 10
2023-03-04 18:06:53.699: watchdog pid 1901: LOG:  I am announcing my self as leader/coordinator watchdog node
2023-03-04 18:06:53.985: sr_check_worker pid 1938: LOG:  verify_backend_node_status: primary 1 owns only 1 standbys out of 2
2023-03-04 18:06:57.941: watchdog pid 1901: LOG:  I am the cluster leader node
2023-03-04 18:06:57.941: watchdog pid 1901: DETAIL:  our declare coordinator message is accepted by all nodes
2023-03-04 18:06:57.941: watchdog pid 1901: LOG:  setting the local node "10.0.2.9:9999 Linux pgpool2" as watchdog cluster leader
2023-03-04 18:06:57.941: watchdog pid 1901: LOG:  signal_user1_to_parent_with_reason(1)
2023-03-04 18:06:57.941: watchdog pid 1901: LOG:  I am the cluster leader node. Starting escalation process
2023-03-04 18:06:57.942: watchdog pid 1901: LOG:  escalation process started with PID:1943
2023-03-04 18:06:57.942: main pid 1898: LOG:  Pgpool-II parent process received SIGUSR1
2023-03-04 18:06:57.942: main pid 1898: LOG:  Pgpool-II parent process received watchdog state change signal from watchdog
2023-03-04 18:06:57.946: watchdog_utility pid 1943: LOG:  watchdog: escalation started
+ POSTGRESQL_STARTUP_USER=root
+ SSH_KEY_FILE=id_rsa_pgpool
+ SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /root/.ssh/id_rsa_pgpool'
+ PGPOOLS=(10.0.2.6 10.0.2.9)
+ VIP=10.0.2.100
+ DEVICE=enp0s3
+ for pgpool in '"${PGPOOLS[@]}"'
+ '[' pgpool2 = 10.0.2.6 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /root/.ssh/id_rsa_pgpool root@10.0.2.6 '
        /sbin/ip addr del 10.0.2.100/24 dev enp0s3
    '
Warning: Permanently added '10.0.2.6' (ECDSA) to the list of known hosts.
RTNETLINK answers: Cannot assign requested address
+ for pgpool in '"${PGPOOLS[@]}"'
+ '[' pgpool2 = 10.0.2.9 ']'
+ ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /root/.ssh/id_rsa_pgpool root@10.0.2.9 '
        /sbin/ip addr del 10.0.2.100/24 dev enp0s3
    '
Warning: Permanently added '10.0.2.9' (ECDSA) to the list of known hosts.
RTNETLINK answers: Cannot assign requested address
+ exit 0
2023-03-04 18:06:58.336: watchdog_utility pid 1943: LOG:  watchdog escalation successful
2023-03-04 18:07:02.373: watchdog_utility pid 1943: LOG:  successfully acquired the delegate IP:"10.0.2.100"
2023-03-04 18:07:02.373: watchdog_utility pid 1943: DETAIL:  'if_up_cmd' returned with success
2023-03-04 18:07:02.374: watchdog pid 1901: LOG:  watchdog escalation process with pid: 1943 exit with SUCCESS.
  1. VIPがpgpool2にアタッチされていることを確認
  2. psqlコマンドでPostgreSQLを参照できることを確認
  3. pgpool2がLEADERになっていることを確認
# ip -4 -brief a s dev enp0s3
enp0s3           UP             10.0.2.9/24 10.0.2.100/24
# psql -h 10.0.2.100 -p 9999 -U postgres -c "show pool_nodes;"
 node_id | hostname | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node |replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 10.0.2.7 | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0.000000 second   | streaming         | async                  | 2023-03-04 17:49:40
 1       | 10.0.2.8 | 5432 | up     | up        | 0.333333  | primary | primary | 2          | true              | 0                 |                   |                        | 2023-03-04 17:49:40
 2       | 10.0.2.5 | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0.000000 second   |                   |                        | 2023-03-04 17:49:40
(3 rows)

# pcp_watchdog_info -h localhost -U pgpool -v -w
Watchdog Cluster Information
Total Nodes              : 2
Remote Nodes             : 1
Member Remote Nodes      : 1
Alive Remote Nodes       : 0
Nodes required for quorum: 1
Quorum state             : QUORUM IS ON THE EDGE
Local node escalation    : YES
Leader Node Name         : 10.0.2.9:9999 Linux pgpool2
Leader Host Name         : 10.0.2.9

Watchdog Node Information
Node Name         : 10.0.2.9:9999 Linux pgpool2
Host Name         : 10.0.2.9
Delegate IP       : 10.0.2.100
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 4
Status Name       : LEADER
Membership Status : MEMBER

Node Name         : 10.0.2.6:9999 Linux pgpool
Host Name         : 10.0.2.6
Delegate IP       : 10.0.2.100
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 10
Status Name       : SHUTDOWN
Membership Status : MEMBER

8.2.9. 落としたPgpool-IIの復帰

pgpoolで実施

# systemctl start pgpool
# pcp_watchdog_info -h localhost -U pgpool -v -w
Watchdog Cluster Information
Total Nodes              : 2
Remote Nodes             : 1
Member Remote Nodes      : 1
Alive Remote Nodes       : 1
Nodes required for quorum: 1
Quorum state             : QUORUM EXIST
Local node escalation    : NO
Leader Node Name         : 10.0.2.9:9999 Linux pgpool2
Leader Host Name         : 10.0.2.9

Watchdog Node Information
Node Name         : 10.0.2.6:9999 Linux pgpool
Host Name         : 10.0.2.6
Delegate IP       : 10.0.2.100
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 7
Status Name       : STANDBY
Membership Status : MEMBER

Node Name         : 10.0.2.9:9999 Linux pgpool2
Host Name         : 10.0.2.9
Delegate IP       : 10.0.2.100
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 4
Status Name       : LEADER
Membership Status : MEMBER

8.3. pcpコマンドを試す

Pgpool-II 4.4.2 文書 II. PCP コマンド
を参考に1つずつコマンドを試す

8.3.1. Pgpool-II の pgpool.conf で定義されたノードの総数を表示する

pcp_node_count

# pcp_node_count -h localhost -U pgpool -v -w
Node Count
____________
 3

8.3.2. 指定されたノードの情報を表示する

pcp_node_info

# pcp_node_info -h localhost -U pgpool -n 0 -v -w
Hostname               : 10.0.2.7
Port                   : 5432
Status                 : 2
Weight                 : 0.333333
Status Name            : up
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0
Replication State      : none
Replication Sync State : none
Last Status Change     : 2023-03-03 21:50:40

# pcp_node_info -h localhost -U pgpool -n 1 -v -w
Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 2
Weight                 : 0.333333
Status Name            : up
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-03 21:50:40

# pcp_node_info -h localhost -U pgpool -n 2 -v -w
Hostname               : 10.0.2.5
Port                   : 5432
Status                 : 2
Weight                 : 0.333333
Status Name            : up
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-03 21:50:40

8.3.3. 与えられたノードIDのヘルスチェック統計データを表示する

pcp_health_check_stats

# pcp_health_check_stats -h localhost -U pgpool -n 0 -v -w
Node Id                       : 0
Host Name                     : 10.0.2.7
Port                          : 5432
Status                        : up
Role                          : primary
Last Status Change            : 2023-03-03 21:50:40
Total Count                   : 141
Success Count                 : 141
Fail Count                    : 0
Skip Count                    : 0
Retry Count                   : 0
Average Retry Count           : 0.000000
Max Retry Count               : 0
Max Health Check Duration     : 85
Minimum Health Check Duration : 8
Average Health Check Duration : 12.652482
Last Health Check             : 2023-03-03 21:56:50
Last Successful Health Check  : 2023-03-03 21:56:50
Last Skip Health Check        :
Last Failed Health Check      :

# pcp_health_check_stats -h localhost -U pgpool -n 1 -v -w
Node Id                       : 1
Host Name                     : 10.0.2.8
Port                          : 5432
Status                        : up
Role                          : standby
Last Status Change            : 2023-03-03 21:50:40
Total Count                   : 141
Success Count                 : 65
Fail Count                    : 1
Skip Count                    : 75
Retry Count                   : 0
Average Retry Count           : 0.000000
Max Retry Count               : 0
Max Health Check Duration     : 25
Minimum Health Check Duration : 8
Average Health Check Duration : 10.848485
Last Health Check             : 2023-03-03 21:56:49
Last Successful Health Check  : 2023-03-03 21:56:49
Last Skip Health Check        : 2023-03-03 21:45:58
Last Failed Health Check      : 2023-03-03 21:33:28

# pcp_health_check_stats -h localhost -U pgpool -n 2 -v -w
Node Id                       : 2
Host Name                     : 10.0.2.5
Port                          : 5432
Status                        : up
Role                          : standby
Last Status Change            : 2023-03-03 21:50:40
Total Count                   : 142
Success Count                 : 60
Fail Count                    : 2
Skip Count                    : 80
Retry Count                   : 0
Average Retry Count           : 0.000000
Max Retry Count               : 0
Max Health Check Duration     : 21
Minimum Health Check Duration : 2
Average Health Check Duration : 11.435484
Last Health Check             : 2023-03-03 21:56:59
Last Successful Health Check  : 2023-03-03 21:56:59
Last Skip Health Check        : 2023-03-03 21:48:28
Last Failed Health Check      : 2023-03-03 21:48:08

8.3.4. Pgpool-II の watchdog ステータスを表示します

pcp_watchdog_info

# pcp_watchdog_info -h localhost -U pgpool -v -w
Watchdog Cluster Information
Total Nodes              : 2
Remote Nodes             : 1
Member Remote Nodes      : 1
Alive Remote Nodes       : 1
Nodes required for quorum: 1
Quorum state             : QUORUM EXIST
Local node escalation    : YES
Leader Node Name         : 10.0.2.6:9999 Linux pgpool
Leader Host Name         : 10.0.2.6

Watchdog Node Information
Node Name         : 10.0.2.6:9999 Linux pgpool
Host Name         : 10.0.2.6
Delegate IP       : 10.0.2.100
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 4
Status Name       : LEADER
Membership Status : MEMBER

Node Name         : 10.0.2.9:9999 Linux pgpool2
Host Name         : 10.0.2.9
Delegate IP       : 10.0.2.100
Pgpool port       : 9999
Watchdog port     : 9000
Node priority     : 1
Status            : 7
Status Name       : STANDBY
Membership Status : MEMBER

8.3.5. Pgpool-II の子プロセスのプロセス ID を一覧表示する

pcp_proc_count

# pcp_proc_count -h localhost -U pgpool -v -w
No       |       PID
_____________________
0        |       1620

~略~

31       |       1716

Total Processes:32

8.3.6. Pgpool-II の子プロセス情報を表示する

pcp_proc_info

# pcp_proc_info -h localhost -U pgpool -a -v -P 1716 -w
Database                  :
Username                  :
Start time                : 2023-03-03 21:55:42
Client connection count   : 0
Major                     : 0
Minor                     : 0
Backend connection time   :
Client connection time    :
Client idle duration      : 0
Client disconnection time :
Pool Counter              : 0
Backend PID               : 0
Connected                 : 0
PID                       : 1716
Backend ID                : 0
Status                    : Wait for connection

~略~

Database                  :
Username                  :
Start time                : 2023-03-03 21:55:42
Client connection count   : 0
Major                     : 0
Minor                     : 0
Backend connection time   :
Client connection time    :
Client idle duration      : 0
Client disconnection time :
Pool Counter              : 0
Backend PID               : 0
Connected                 : 0
PID                       : 1716
Backend ID                : 2
Status                    : Wait for connection

8.3.7. pgpool.conf のパラメータ設定値を取得する

pcp_pool_status

# pcp_pool_status -h localhost -U pgpool -v -w
Name [  0]:     backend_clustering_mode
Value:          1
Description:    clustering mode

~略~

Name [180]:     backend_application_name2
Value:          10.0.2.5
Description:    application_name for backend #2

8.3.8. Pgpool-II からノードを切り離す

pcp_detach_node

# pcp_detach_node -h localhost -U pgpool -n 2 -w
pcp_detach_node -- Command Successful
# pcp_node_info -h localhost -U pgpool -n 2 -v -w
Hostname               : 10.0.2.5
Port                   : 5432
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-03 22:11:21

8.3.9. Pgpool-II にノードを復帰させる

pcp_attach_node

# pcp_attach_node -h localhost -U pgpool -n 2 -w
pcp_attach_node -- Command Successful
# pcp_node_info -h localhost -U pgpool -n 2 -v -w
Hostname               : 10.0.2.5
Port                   : 5432
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-03 22:11:21

8.3.10. Pgpool-II のノードをプライマリに昇格させる

pcp_promote_node

# pcp_promote_node -h localhost -U pgpool -n 1 -w
pcp_promote_node -- Command Successful
# pcp_node_info -h localhost -U pgpool -n 1 -v -w
Hostname               : 10.0.2.8
Port                   : 5432
Status                 : 2
Weight                 : 0.333333
Status Name            : up
Backend Status Name    : up
Role                   : primary
Backend Role           : standby
Replication Delay      : 0.000000 second
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2023-03-03 22:13:31

8.3.11. Pgpool-II を指定されたモードでシャットダウンする

pcp_stop_pgpool

# pcp_stop_pgpool -h localhost -U pgpool --mode=fast --scope=local -w
pcp_stop_pgpool -- Command Successful
# ps aux | grep [p]gpool | wc -l
0
# ip -4 a l enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.2.6/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
       valid_lft 375sec preferred_lft 375sec

別サーバで確認

# ip -4 a l enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.2.9/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
       valid_lft 590sec preferred_lft 590sec
    inet 10.0.2.100/24 scope global secondary enp0s3:0
       valid_lft forever preferred_lft forever

8.3.12. Pgpool-IIの設定ファイルを再読込します

pcp_reload_config

# pcp_reload_config -h localhost -U pgpool --scope=cluster -w
pcp_reload_config -- Command Successful

8.3.13. Pgpool-II のノードのデータを再同期させた上で復帰させる

pcp_recovery_node

# pcp_recovery_node -h localhost -U pgpool -n 1 -w
pcp_recovery_node -- Command Successful
# pcp_recovery_node -h localhost -U pgpool -n 2 -w
pcp_recovery_node -- Command Successful

9. 参考リンク

3
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
4