【2024年1月版】Ubuntu22.04、Ubuntu24.04 でVIPふってクラスタリング [pacemaker, corosync, pcs]

Last updated at 2025-01-07Posted at 2023-01-05

はじめに

Ubuntu で VIP ふってクラスタリングしたいな、そうだ pacemaker + corosync しよう
HAクラスタはpacemaker標準となった pcs でコントロール
CRM関連コマンド(crm_xxx)、cibadminコマンド、crmコマンドは使わない

2024年1月 Ubuntu22.04の記事でしたが、24.04でも同様に実行して動作することを確認しました。

本記事の流れ

とりあえず構築
pacemaker + corosync で知っておくべきことを説明

環境

ノードは２つ
- ノード1: ha01(192.168.100.240)
- ノード2: ha02(192.168.100.241)
VIPは 192.168.100.245 を設定
クラスター名は testcluster を設定
STONITHは利用しない
- 不安定ノードの強制停止／再起動はしない
2017年頃に利用していた pcs から、バージョンが新しいため、いくつかのコマンドが変わっていたので注意

$ uname -a
Linux ha01 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:56:13 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

インストール

ノード ha01, ha02 の双方で行う
パッケージのインストールとユーザ作成、サービス化

$ sudo apt install -y corosync pacemaker pcs
$ sudo passwd hacluster
New password: 
Retype new password: 
$ sudo systemctl enable --now pcsd

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5

hosts を設定

sudo vim /etc/hosts

hosts の確認
- hosts編集の際、自ホスト名が127.0.0.1に存在しないように注意

$ sudo cat /etc/hosts
127.0.0.1 localhost

192.168.100.240 ha01
192.168.100.241 ha02

クラスターの設定

ha01 で操作

ha01:~$ sudo pcs host auth ha01 ha02 -u hacluster
Password: 
ha01: Authorized
ha02: Authorized

ha01:~$ sudo pcs cluster setup testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Destroying cluster on hosts: 'ha01', 'ha02'...
ha01: Successfully destroyed cluster
ha02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'ha01', 'ha02'
ha01: successful removal of the file 'pcsd settings'
ha02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'ha01', 'ha02'
ha01: successful distribution of the file 'corosync authkey'
ha01: successful distribution of the file 'pacemaker authkey'
ha02: successful distribution of the file 'corosync authkey'
ha02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'ha01', 'ha02'
ha01: successful distribution of the file 'corosync.conf'
ha02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

ha01:~$ sudo pcs cluster start --all
ha02: Starting Cluster...
ha01: Starting Cluster...

ha01:~$ sudo pcs property set stonith-enabled=false
ha01:~$ sudo pcs property set no-quorum-policy=ignore


ha01:~$ sudo pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.100.245 cidr_netmask=24 op monitor interval=10s

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:09:31 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

以下のエラーが発生して、testclusterが作成できない場合は、--force をつけるとうまくいきます。この症状は、Ubuntu24.04で発生しました。強制的に現在存在しているクラスターを破棄するので注意が必要です。

↓ うまくいかない時

$ sudo pcs cluster setup testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Error: ha02: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry, use --force to override
Error: ha02: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha02' to remove those configuration files, use --force to override
Error: ha01: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry, use --force to override
Error: ha01: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha01' to remove those configuration files, use --force to override
Error: Some nodes are already in a cluster. Enforcing this will destroy existing cluster on those nodes. You should remove the nodes from their clusters instead to keep the clusters working properly, use --force to override
Error: Errors have occurred, therefore pcs is unable to continue

↓ --force でうまくいく

ha01:~$ sudo pcs cluster setup --force testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Warning: ha02: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry
Warning: ha02: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha02' to remove those configuration files
Destroying cluster on hosts: 'ha01', 'ha02'...
ha02: Successfully destroyed cluster
ha01: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'ha01', 'ha02'
ha02: successful removal of the file 'pcsd settings'
ha01: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'ha01', 'ha02'
ha02: successful distribution of the file 'corosync authkey'
ha02: successful distribution of the file 'pacemaker authkey'
ha01: successful distribution of the file 'corosync authkey'
ha01: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'ha01', 'ha02'
ha02: successful distribution of the file 'corosync.conf'
ha01: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

ha02 で操作

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:01:54 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5

クラスターのステータス確認

$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:09:31 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

指定ノードをオンラインからスタンバイに設定

ha01 をスタンバイに
ha01 がオンラインからスタンバイに変更される
ha02 に VIP がふられる
ha01 で操作（VIP 192.168.100.245 がふられている）

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:37:27 2023
  * Last change:  Thu Jan  5 01:37:23 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

スタンバイに設定

ha01:~$ sudo pcs node standby ha01

VIP が変わっていることを確認

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:38:05 2023
  * Last change:  Thu Jan  5 01:38:01 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node ha01: standby
  * Online: [ ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5

ha02 で操作（VIP 192.168.100.245 がふられている）

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:40:48 2023
  * Last change:  Thu Jan  5 01:38:01 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node ha01: standby
  * Online: [ ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

指定ノードをスタンバイからオンラインに設定

VIP は変更されないことを確認

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:20:20 2023
  * Last change:  Thu Jan  5 02:20:15 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node ha01: standby
  * Online: [ ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ sudo pcs node unstandby ha01

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:21:14 2023
  * Last change:  Thu Jan  5 02:21:06 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

VIP のつけかえ

ha02 に VIP がふられている状態から ha01 に VIP をふりかえ
VIP を確認

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:42:01 2023
  * Last change:  Thu Jan  5 02:32:05 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:44:24 2023
  * Last change:  Thu Jan  5 02:32:05 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01 に VIP をふりかえ

$ sudo pcs resource move VIP ha01

VIP を確認

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:47:44 2023
  * Last change:  Thu Jan  5 02:44:48 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:48:08 2023
  * Last change:  Thu Jan  5 02:44:48 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

指定クラスターの停止

ha01 のクラスターを停止
ha01 がオンラインからオフラインに
ha02 に VIP がふられる
ha01 のクラスターはステータスも確認できない状態になる

ha01:~$ sudo pcs cluster stop ha01
ha01: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (corosync)...```

VIP が変わっていることを確認
ha01 で操作（VIP 192.168.100.245 がはずれている）

ha01:~$ sudo pcs status
Error: error running crm_mon, is pacemaker running?
  crm_mon: Error: cluster is not available on this node

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5

ha02 で操作（VIP 192.168.100.245 がふられている）

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:17:39 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha02 ]
  * OFFLINE: [ ha01 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

クラスター全体の停止

$ sudo pcs cluster stop --all
ha02: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (corosync)...
ha02: Stopping Cluster (corosync)...

指定クラスターの起動

$ sudo pcs cluster start ha01
ha01: Starting Cluster...

クラスターの削除

$ sudo pcs cluster destroy --all
Warning: Unable to load CIB to get guest and remote nodes from it, those nodes will not be deconfigured.
ha02: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (pacemaker)...
ha01: Successfully destroyed cluster
ha02: Successfully destroyed cluster

firewall で開放が必要なポート

ufw でポート指定する場合
- 2224/tcp
- 3121/tcp
- 5403/tcp
- 5404/udp
- 5405/udp
- 21064/tcp
- 9929/tcp
- 9929/udp
firewall-cmd で指定する場合
- firewall-cmd --add-service=high-availability
- firewall-cmd --runtime-to-permanent

pcs コマンド解説

編集中...

参考資料

※ 全体的に pacemaker のバージョンが古いのでコマンドが違う

さいごに

かんたんでしたね

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up