0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

【2024年1月版】Ubuntu22.04、Ubuntu24.04 でVIPふってクラスタリング [pacemaker, corosync, pcs]

Last updated at Posted at 2023-01-05

はじめに

  • Ubuntu で VIP ふってクラスタリングしたいな、そうだ pacemaker + corosync しよう
  • HAクラスタはpacemaker標準となった pcs でコントロール
  • CRM関連コマンド(crm_xxx)、cibadminコマンド、crmコマンドは使わない

2024年1月 Ubuntu22.04の記事でしたが、24.04でも同様に実行して動作することを確認しました。

本記事の流れ

  • とりあえず構築
  • pacemaker + corosync で知っておくべきことを説明

環境

  • ノードは2つ
    • ノード1: ha01(192.168.100.240)
    • ノード2: ha02(192.168.100.241)
  • VIPは 192.168.100.245 を設定
  • クラスター名は testcluster を設定
  • STONITHは利用しない
    • 不安定ノードの強制停止/再起動はしない
  • 2017年頃に利用していた pcs から、バージョンが新しいため、いくつかのコマンドが変わっていたので注意
$ uname -a
Linux ha01 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:56:13 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

インストール

  • ノード ha01, ha02 の双方で行う
  • パッケージのインストールとユーザ作成、サービス化
$ sudo apt install -y corosync pacemaker pcs
$ sudo passwd hacluster
New password: 
Retype new password: 
$ sudo systemctl enable --now pcsd
ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
  • hosts を設定
sudo vim /etc/hosts
  • hosts の確認
    • hosts編集の際、自ホスト名が127.0.0.1に存在しないように注意
$ sudo cat /etc/hosts
127.0.0.1 localhost

192.168.100.240 ha01
192.168.100.241 ha02

クラスターの設定

  • ha01 で操作
ha01:~$ sudo pcs host auth ha01 ha02 -u hacluster
Password: 
ha01: Authorized
ha02: Authorized

ha01:~$ sudo pcs cluster setup testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Destroying cluster on hosts: 'ha01', 'ha02'...
ha01: Successfully destroyed cluster
ha02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'ha01', 'ha02'
ha01: successful removal of the file 'pcsd settings'
ha02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'ha01', 'ha02'
ha01: successful distribution of the file 'corosync authkey'
ha01: successful distribution of the file 'pacemaker authkey'
ha02: successful distribution of the file 'corosync authkey'
ha02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'ha01', 'ha02'
ha01: successful distribution of the file 'corosync.conf'
ha02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

ha01:~$ sudo pcs cluster start --all
ha02: Starting Cluster...
ha01: Starting Cluster...

ha01:~$ sudo pcs property set stonith-enabled=false
ha01:~$ sudo pcs property set no-quorum-policy=ignore


ha01:~$ sudo pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.100.245 cidr_netmask=24 op monitor interval=10s

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:09:31 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

以下のエラーが発生して、testclusterが作成できない場合は、--force をつけるとうまくいきます。この症状は、Ubuntu24.04で発生しました。強制的に現在存在しているクラスターを破棄するので注意が必要です。

↓ うまくいかない時

$ sudo pcs cluster setup testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Error: ha02: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry, use --force to override
Error: ha02: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha02' to remove those configuration files, use --force to override
Error: ha01: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry, use --force to override
Error: ha01: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha01' to remove those configuration files, use --force to override
Error: Some nodes are already in a cluster. Enforcing this will destroy existing cluster on those nodes. You should remove the nodes from their clusters instead to keep the clusters working properly, use --force to override
Error: Errors have occurred, therefore pcs is unable to continue

--force でうまくいく

ha01:~$ sudo pcs cluster setup --force testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Warning: ha02: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry
Warning: ha02: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha02' to remove those configuration files
Destroying cluster on hosts: 'ha01', 'ha02'...
ha02: Successfully destroyed cluster
ha01: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'ha01', 'ha02'
ha02: successful removal of the file 'pcsd settings'
ha01: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'ha01', 'ha02'
ha02: successful distribution of the file 'corosync authkey'
ha02: successful distribution of the file 'pacemaker authkey'
ha01: successful distribution of the file 'corosync authkey'
ha01: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'ha01', 'ha02'
ha02: successful distribution of the file 'corosync.conf'
ha01: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
  • ha02 で操作
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:01:54 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5

クラスターのステータス確認

$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:09:31 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

指定ノードをオンラインからスタンバイに設定

  • ha01 をスタンバイに

  • ha01 がオンラインからスタンバイに変更される

  • ha02 に VIP がふられる

  • ha01 で操作(VIP 192.168.100.245 がふられている)

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:37:27 2023
  * Last change:  Thu Jan  5 01:37:23 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
  • スタンバイに設定
ha01:~$ sudo pcs node standby ha01
  • VIP が変わっていることを確認
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:38:05 2023
  * Last change:  Thu Jan  5 01:38:01 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node ha01: standby
  * Online: [ ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
  • ha02 で操作(VIP 192.168.100.245 がふられている)
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:40:48 2023
  * Last change:  Thu Jan  5 01:38:01 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node ha01: standby
  * Online: [ ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

指定ノードをスタンバイからオンラインに設定

  • VIP は変更されないことを確認
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:20:20 2023
  * Last change:  Thu Jan  5 02:20:15 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Node ha01: standby
  * Online: [ ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ sudo pcs node unstandby ha01

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:21:14 2023
  * Last change:  Thu Jan  5 02:21:06 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

VIP のつけかえ

  • ha02 に VIP がふられている状態から ha01 に VIP をふりかえ

  • VIP を確認

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:42:01 2023
  * Last change:  Thu Jan  5 02:32:05 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:44:24 2023
  * Last change:  Thu Jan  5 02:32:05 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
  • ha01 に VIP をふりかえ
$ sudo pcs resource move VIP ha01
  • VIP を確認
ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:47:44 2023
  * Last change:  Thu Jan  5 02:44:48 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5

ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 02:48:08 2023
  * Last change:  Thu Jan  5 02:44:48 2023 by root via crm_resource on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha01 ha02 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

指定クラスターの停止

  • ha01 のクラスターを停止
  • ha01 がオンラインからオフラインに
  • ha02 に VIP がふられる
  • ha01 のクラスターはステータスも確認できない状態になる
ha01:~$ sudo pcs cluster stop ha01
ha01: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (corosync)...```
  • VIP が変わっていることを確認

  • ha01 で操作(VIP 192.168.100.245 がはずれている)

ha01:~$ sudo pcs status
Error: error running crm_mon, is pacemaker running?
  crm_mon: Error: cluster is not available on this node

ha01:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
  • ha02 で操作(VIP 192.168.100.245 がふられている)
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Jan  5 01:17:39 2023
  * Last change:  Thu Jan  5 01:00:30 2023 by root via cibadmin on ha01
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ha02 ]
  * OFFLINE: [ ha01 ]

Full List of Resources:
  * VIP	(ocf:heartbeat:IPaddr2):	 Started ha02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ha02:~$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
    inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5

クラスター全体の停止

$ sudo pcs cluster stop --all
ha02: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (corosync)...
ha02: Stopping Cluster (corosync)...

指定クラスターの起動

$ sudo pcs cluster start ha01
ha01: Starting Cluster...

クラスターの削除

$ sudo pcs cluster destroy --all
Warning: Unable to load CIB to get guest and remote nodes from it, those nodes will not be deconfigured.
ha02: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (pacemaker)...
ha01: Successfully destroyed cluster
ha02: Successfully destroyed cluster

firewall で開放が必要なポート

  • ufw でポート指定する場合

    • 2224/tcp
    • 3121/tcp
    • 5403/tcp
    • 5404/udp
    • 5405/udp
    • 21064/tcp
    • 9929/tcp
    • 9929/udp
  • firewall-cmd で指定する場合

    • firewall-cmd --add-service=high-availability
    • firewall-cmd --runtime-to-permanent

pcs コマンド解説

編集中...

参考資料

※ 全体的に pacemaker のバージョンが古いのでコマンドが違う

さいごに

  • かんたんでしたね
0
0
9

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?