はじめに
- Ubuntu で VIP ふってクラスタリングしたいな、そうだ pacemaker + corosync しよう
- HAクラスタはpacemaker標準となった pcs でコントロール
- CRM関連コマンド(crm_xxx)、cibadminコマンド、crmコマンドは使わない
2024年1月 Ubuntu22.04の記事でしたが、24.04でも同様に実行して動作することを確認しました。
本記事の流れ
- とりあえず構築
- pacemaker + corosync で知っておくべきことを説明
環境
- ノードは2つ
- ノード1: ha01(192.168.100.240)
- ノード2: ha02(192.168.100.241)
- VIPは 192.168.100.245 を設定
- クラスター名は testcluster を設定
- STONITHは利用しない
- 不安定ノードの強制停止/再起動はしない
- 2017年頃に利用していた pcs から、バージョンが新しいため、いくつかのコマンドが変わっていたので注意
$ uname -a
Linux ha01 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:56:13 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
インストール
- ノード ha01, ha02 の双方で行う
- パッケージのインストールとユーザ作成、サービス化
$ sudo apt install -y corosync pacemaker pcs
$ sudo passwd hacluster
New password:
Retype new password:
$ sudo systemctl enable --now pcsd
ha01:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
ha02:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
- hosts を設定
sudo vim /etc/hosts
- hosts の確認
- hosts編集の際、自ホスト名が127.0.0.1に存在しないように注意
$ sudo cat /etc/hosts
127.0.0.1 localhost
192.168.100.240 ha01
192.168.100.241 ha02
クラスターの設定
- ha01 で操作
ha01:~$ sudo pcs host auth ha01 ha02 -u hacluster
Password:
ha01: Authorized
ha02: Authorized
ha01:~$ sudo pcs cluster setup testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Destroying cluster on hosts: 'ha01', 'ha02'...
ha01: Successfully destroyed cluster
ha02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'ha01', 'ha02'
ha01: successful removal of the file 'pcsd settings'
ha02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'ha01', 'ha02'
ha01: successful distribution of the file 'corosync authkey'
ha01: successful distribution of the file 'pacemaker authkey'
ha02: successful distribution of the file 'corosync authkey'
ha02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'ha01', 'ha02'
ha01: successful distribution of the file 'corosync.conf'
ha02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
ha01:~$ sudo pcs cluster start --all
ha02: Starting Cluster...
ha01: Starting Cluster...
ha01:~$ sudo pcs property set stonith-enabled=false
ha01:~$ sudo pcs property set no-quorum-policy=ignore
ha01:~$ sudo pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.100.245 cidr_netmask=24 op monitor interval=10s
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:09:31 2023
* Last change: Thu Jan 5 01:00:30 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha01:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5
以下のエラーが発生して、testclusterが作成できない場合は、--force
をつけるとうまくいきます。この症状は、Ubuntu24.04で発生しました。強制的に現在存在しているクラスターを破棄するので注意が必要です。
↓ うまくいかない時
$ sudo pcs cluster setup testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Error: ha02: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry, use --force to override
Error: ha02: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha02' to remove those configuration files, use --force to override
Error: ha01: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry, use --force to override
Error: ha01: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha01' to remove those configuration files, use --force to override
Error: Some nodes are already in a cluster. Enforcing this will destroy existing cluster on those nodes. You should remove the nodes from their clusters instead to keep the clusters working properly, use --force to override
Error: Errors have occurred, therefore pcs is unable to continue
↓ --force
でうまくいく
ha01:~$ sudo pcs cluster setup --force testcluster ha01 ha02
No addresses specified for host 'ha01', using 'ha01'
No addresses specified for host 'ha02', using 'ha02'
Warning: ha02: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry
Warning: ha02: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'ha02' to remove those configuration files
Destroying cluster on hosts: 'ha01', 'ha02'...
ha02: Successfully destroyed cluster
ha01: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'ha01', 'ha02'
ha02: successful removal of the file 'pcsd settings'
ha01: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'ha01', 'ha02'
ha02: successful distribution of the file 'corosync authkey'
ha02: successful distribution of the file 'pacemaker authkey'
ha01: successful distribution of the file 'corosync authkey'
ha01: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'ha01', 'ha02'
ha02: successful distribution of the file 'corosync.conf'
ha01: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
- ha02 で操作
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:01:54 2023
* Last change: Thu Jan 5 01:00:30 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha02:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
クラスターのステータス確認
$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha01 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:09:31 2023
* Last change: Thu Jan 5 01:00:30 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
指定ノードをオンラインからスタンバイに設定
-
ha01 をスタンバイに
-
ha01 がオンラインからスタンバイに変更される
-
ha02 に VIP がふられる
-
ha01 で操作(VIP 192.168.100.245 がふられている)
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:37:27 2023
* Last change: Thu Jan 5 01:37:23 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
- スタンバイに設定
ha01:~$ sudo pcs node standby ha01
- VIP が変わっていることを確認
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:38:05 2023
* Last change: Thu Jan 5 01:38:01 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Node ha01: standby
* Online: [ ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha01:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
- ha02 で操作(VIP 192.168.100.245 がふられている)
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:40:48 2023
* Last change: Thu Jan 5 01:38:01 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Node ha01: standby
* Online: [ ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha02:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5
指定ノードをスタンバイからオンラインに設定
- VIP は変更されないことを確認
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 02:20:20 2023
* Last change: Thu Jan 5 02:20:15 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Node ha01: standby
* Online: [ ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha01:~$ sudo pcs node unstandby ha01
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 02:21:14 2023
* Last change: Thu Jan 5 02:21:06 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
VIP のつけかえ
-
ha02 に VIP がふられている状態から ha01 に VIP をふりかえ
-
VIP を確認
ha02:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 02:42:01 2023
* Last change: Thu Jan 5 02:32:05 2023 by root via crm_resource on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha01:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 02:44:24 2023
* Last change: Thu Jan 5 02:32:05 2023 by root via crm_resource on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
- ha01 に VIP をふりかえ
$ sudo pcs resource move VIP ha01
- VIP を確認
ha01:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5
ha01:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 02:47:44 2023
* Last change: Thu Jan 5 02:44:48 2023 by root via crm_resource on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha02:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 02:48:08 2023
* Last change: Thu Jan 5 02:44:48 2023 by root via crm_resource on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha01 ha02 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
指定クラスターの停止
- ha01 のクラスターを停止
- ha01 がオンラインからオフラインに
- ha02 に VIP がふられる
- ha01 のクラスターはステータスも確認できない状態になる
ha01:~$ sudo pcs cluster stop ha01
ha01: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (corosync)...```
-
VIP が変わっていることを確認
-
ha01 で操作(VIP 192.168.100.245 がはずれている)
ha01:~$ sudo pcs status
Error: error running crm_mon, is pacemaker running?
crm_mon: Error: cluster is not available on this node
ha01:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.240/24 brd 192.168.100.255 scope global enp0s5
- ha02 で操作(VIP 192.168.100.245 がふられている)
ha02:~$ sudo pcs status
Cluster name: testcluster
Cluster Summary:
* Stack: corosync
* Current DC: ha02 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Jan 5 01:17:39 2023
* Last change: Thu Jan 5 01:00:30 2023 by root via cibadmin on ha01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ ha02 ]
* OFFLINE: [ ha01 ]
Full List of Resources:
* VIP (ocf:heartbeat:IPaddr2): Started ha02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ha02:~$ ip a | grep inet
inet 127.0.0.1/8 scope host lo
inet 192.168.100.241/24 brd 192.168.100.255 scope global enp0s5
inet 192.168.100.245/24 brd 192.168.100.255 scope global secondary enp0s5
クラスター全体の停止
$ sudo pcs cluster stop --all
ha02: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (corosync)...
ha02: Stopping Cluster (corosync)...
指定クラスターの起動
$ sudo pcs cluster start ha01
ha01: Starting Cluster...
クラスターの削除
$ sudo pcs cluster destroy --all
Warning: Unable to load CIB to get guest and remote nodes from it, those nodes will not be deconfigured.
ha02: Stopping Cluster (pacemaker)...
ha01: Stopping Cluster (pacemaker)...
ha01: Successfully destroyed cluster
ha02: Successfully destroyed cluster
firewall で開放が必要なポート
-
ufw でポート指定する場合
- 2224/tcp
- 3121/tcp
- 5403/tcp
- 5404/udp
- 5405/udp
- 21064/tcp
- 9929/tcp
- 9929/udp
-
firewall-cmd で指定する場合
- firewall-cmd --add-service=high-availability
- firewall-cmd --runtime-to-permanent
pcs コマンド解説
編集中...
参考資料
※ 全体的に pacemaker のバージョンが古いのでコマンドが違う
- 試して覚える Pacemaker-2.0入門 『シェアードナッシングな高可用クラスタの実現』
- 試して覚える Pacemaker-2.0入門 『構築・リソース設定』
- Pacemakerで学ぶ HAクラスタ
- Pacemakerと HAProxyではじめる 高可用ロードバランサ入門
- 試して覚えるPacemaker入門 排他制御編
- 試して覚えるPacemaker入門 PG-REX(Pacemaker + PostgreSQLによるシェアードナッシングHA構成)構築(PDF)
- 試して覚えるPacemaker入門 PG-REX(Pacemaker + PostgreSQLによるシェアードナッシングHA構成)運用(PDF)
- 高信頼システム構築標準教科書 - 仮想化と高可用性 -
- 徹底攻略LPIC Level3 304教科書+問題集[Version 2.0]対応 徹底攻略シリーズ
- LPI 304 Virtualization & High Availability Practice Exam
- HAクラスタをDRBDとPacemakerで作ってみよう [Pacemaker編]
- 最強WEB問題集LinuC Lv3-304(Ver2.0)
- オライリー サーバ負荷分散技術
- [24時間365日]サーバ/インフラを支える技術 ……スケーラビリティ、ハイパフォーマンス、省力運用 WEB+DB PRESS plus
- DRBD Tech Info - 使っておぼえるシリーズ
- DRBD Tech Info - ドキュメント
- LINUX-HA JAPAN
- LINUX-HA JAPAN - Pacemakerの概要
- LINUX-HA JAPAN - マニュアル
- Heartbeat+Pacemaker+DRBDで高可用Linux 体験! 新しくなったLinux-HA
- VirtualBox と Rocky Linux 8 で始める Pacemaker ~ VirtualBox でも STONITH 機能が試せる! VirtualBMCの活用
- Red Hat Enterprise Linux8 > 高可用性クラスターの設定および管理 > 第2章 Pacemaker の使用の開始
- Red Hat Enterprise Linux 8 高可用性クラスターの設定および管理
- 【入門】 PostgreSQL+Pacemaker+DRBDで 高可用性構成を構築してみよう
- Oracle Linux 8 - 高可用性クラスタリングの設定
- MIRACLE Linux - DRBD、Heartbeat、Pacemaker による Zabbix サーバの HA クラスタ構築
- インフラ構築手順書 - PaceMaker+DRBD構築
- PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション)
さいごに
- かんたんでしたね