More than 1 year has passed since last update.

オンプレおじさんのコンテナチャレンジ(その2. Pacemaker)

Posted at 2023-08-12

なんで「おじさん」なんて書いちゃったんだろう。いまになって恥ずかしくなってきた。余計におじさんっぽいじゃないか。

Pacemakerのインストール

Pacemaker : インストール
- Server Worldさん、本当に助かります

$ sudo apt-get -d install pacemaker pcs resource-agents
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
:
Need to get 21.1 MB of archives.
After this operation, 88.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
:
Fetched 21.1 MB in 23s (910 kB/s)
Download complete and in download only mode

$ sudo mkdir -p ~/pkgs/02_pacemaker
$ sudo mv /var/cache/apt/archives/*deb ~/pkgs/02_pacemaker/.
$ cd ~/pkgs/02_pacemaker
$ sudo apt -y install ./*deb

ローカルのdebをインストールするときに./をつけないといけない。ところで.debってでーいーびーって呼んで良いのかな。rpmもあーるぴーえむだもんね。でびとか読まないよね。

# インストール直後に起動してしまうので止めておく
$ sudo systemctl stop pacemaker corosync

# pcsコマンドは起動しておく
$ sudo systemctl enable --now pcsd

# インストールと共に作成されるhaclusterアカウントのパスワードを変更
$ sudo grep hacluster /etc/passwd
hacluster:x:114:119::/var/lib/pacemaker:/usr/sbin/nologin

$ sudo passwd hacluster
New password:
Retype new password:
passwd: password updated successfully

対向ノード構築

ここまでをもう1台のサーバでも実施する。

ノード間で認証を確立

(両ノードで実施)
$ sudo vi /etc/hosts

以下のとおり編集
---
127.0.0.1 localhost
192.168.56.103  node1
192.168.56.104  node2
---

$ sudo ufw allow http
$ sudo ufw allow 2224
$ sudo ufw allow 3121
$ sudo ufw reload
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere
80/tcp                     ALLOW       Anywhere
2224                       ALLOW       Anywhere
3121                       ALLOW       Anywhere
22/tcp (v6)                ALLOW       Anywhere (v6)
80/tcp (v6)                ALLOW       Anywhere (v6)
2224 (v6)                  ALLOW       Anywhere (v6)
3121 (v6)                  ALLOW       Anywhere (v6)

(node1のみ実施)
# 認証
$ sudo pcs host auth node1 node2
Username: hacluster
Password:
node1: Authorized
node2: Authorized

クラスターを構築

(node1のみ実施)
# クラスターを構築
$ sudo pcs cluster setup ha_cluster node1 node2 --force
:
node1: successful distribution of the file 'corosync.conf'
node2: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

# クラスターに必要なサービスを起動
$ sudo pcs cluster start --all
node2: Starting Cluster...
node1: Starting Cluster...

# 自動起動を設定
$ sudo pcs cluster enable --all
node1: Cluster Enabled
node2: Cluster Enabled

# 状態確認
$ sudo pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: node2 (version 2.1.2-ada5c3b36e2) - partition with quorum
   * Last updated: Sat Aug 12 12:53:44 2023
   * Last change:  Sat Aug 12 12:53:12 2023 by hacluster via crmd on node2
   * 2 nodes configured
   * 0 resource instances configured
 Node List:
   * Online: [ node1 node2 ]

PCSD Status:
  node1: Online
  node2: Online

# corosyncの状態チェック
$ sudo pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 node1 (local)
         2          1 node2

うわーなんか懐かしいなこの感じ。

STONITH設定

ここは一旦保留。本番環境はプライベートクラウド基盤。ノードから他ノードの物理的停止(KVMでいうところの
virshコマンド相当)を組み込む必要がある。駄目って言われたらどうしよう。fence_scsiみたいなものがあると聞いたけど、これは共有ディスクの奪い合いに不整合を起こさせて論理的に停止するものっぽい。

そうではなくて、例えばOSハングアップなどによりIPアドレスは生きているが応答がない、などのときに備えて物理的に停止させておきたい。これはどうすればできるのか。

…っていつも悩むんだけど、それってあり得るの？今回は共有ディスクを持たず、PostgreSQL Serverも互いにDBを持ってレプリケーションを昇格すればいいだけなので相手の応答が無いという事実だけキャッチできれば良さそう。

スプリットのブレイン対策機能の代替
- 上記資料のスライド27。仮想環境ではlibvirtdなどを使用している。まず環境を知らなければ利用できないということだ

今後の流れ

クラスタの大枠は作ったので、あとは実際に動かすコンテナをちまちま作っていく。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up