More than 5 years have passed since last update.

Pacemaker + Corosync + ElasticIP on AmazonLinux

Last updated at 2017-08-31Posted at 2017-08-31

Architecture

プライマリサーバがダウンした場合に、セカンダリサーバへElasticIPが切り替わるように構成

Install Version

$ cat /etc/system-release
Amazon Linux AMI release 2017.03

$ pacemakerd --version
Pacemaker 1.1.16-1.el6
Written by Andrew Beekhof

$ corosync -v
Corosync Cluster Engine, version '2.4.2'
Copyright (c) 2006-2009 Red Hat, Inc.

$ pcs --version
0.9.156

$ crm --version
2.1.9-1.el6 (Build unknown)

Install

Linux-HA Japan 配布の「Pacemaker-1.1 (Pacemaker-1.1系+Corosync 2系) - Pacemaker-1.1.16-1.1 リポジトリパッケージ」を利用させていただきました

AmazonLinux2017.03 では、システムデフォルトがpython2.7のため、pcs, crm のコマンドを利用できるようにするため、「Amazon Linux Python 2.7(default) -> 2.6 へ切り替え」を事前に実施

yum install -y http://jaist.dl.sourceforge.jp/linux-ha/67818/pacemaker-repo-1.1.16-1.1.el6.x86_64.rpm

yum -c /etc/yum.repos.d/pacemaker.repo install -y pacemaker pcs python26-lxml
yum install -y crmsh

※ pacemakerをインストールすると一緒にcorosyncも入ります

パッケージに関して下記の補足事項があったので注意

Heartbeatは今後非推奨とするためパッケージには含みません。

ユーザインタフェースとして crmsh, pcs の2種類のコンポーネントを同梱していますが、1.0系と同様 crmsh の利用を推奨します。

crmsh の機能のうち、pssh を利用する一部の機能(crm resource secret, crm cluster 等)は利用できません(コミュニティ版psshの制約による)。将来のバージョンアップ(crmsh-2.2系以降)で改善を検討します。

pcs は本パッケージでは「テクノロジープレビュー」(お試し版)の位置付けで添付しており、実用目的の利用は推奨しません。また利用する場合はさらに依存パッケージの追加インストールが必要です。

STONITHプラグインとして cluster-glue, fence-agents の2種類のコンポーネントを同梱していますが、1.0系と同様 cluster-glue に含まれるSTONITHプラグインの利用を推奨します。

1.0.13リポジトリパッケージに含まれていた pm_kvm_tools, vm-ctl コンポーネントは本バージョンには対応していないためパッケージには含みません。

Configure corosync.conf

Corosync.conf(5)マニュアル - Thirdware Tech Info

$ cat /etc/corosync/corosync.conf
totem {
    version: 2

    crypto_cipher: none
    crypto_hash: none

    interface {
        ringnumber: 0
        # bindnetaddrはシステムで設定するIPアドレス、またはネットワークアドレスである必要があります。
        # セカンダリでは 10.0.20.10 を設定
        bindnetaddr: 10.0.10.10
        mcastport: 5405
        ttl: 1
    }
    # use unicast communication, do not multicast.
    transport: udpu
}

quorum {
     provider: corosync_votequorum
     # `expected_votes` must be configured.
     expected_votes: 2
     two_node: 1
     # `two_node: 1` is automatic set to `wait_for_all: 1`
     # wait_for_all: 0
}

nodelist {
    node {
        ring0_addr: 10.0.10.10
        nodeid: 1
    }
    node {
        ring0_addr: 10.0.20.10
        nodeid: 2
    }
}

service {
     # Load the Pacemaker Cluster Resource Manager
     name: pacemaker
     ver: 1
}

logging {
    fileline: off
    to_logfile: yes
    to_syslog: yes
    logfile: /var/log/cluster/corosync.log
    debug: off
    timestamp: on
}

Start corosync

$ sudo service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]

$ sudo corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
	id	= 10.0.10.10
	status	= ring 0 active with no faults
# セカンダリで確認した場合 id が 10.0.20.10 になっている

$ sudo corosync-cmapctl | grep member
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.10.10) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.20.10) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

$ sudo pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 10.0.10.10 (local)
         2          1 10.0.20.10

Configure pacemaker

pacemakerのリソースとして ocf:heartbeat:awseip を利用する場合、 awsコマンドが利用されているため、REGIONの指定が必要になります、そのため /etc/sysconfig/pacemaker の末尾に AWS_DEFAULT_REGION=ap-northeast-1 を追加しています。
リソース制御について詳しくは /usr/lib/ocf/resource.d/heartbeat/awseip の実装を参照

$ pcs resource describe ocf:heartbeat:awseip
ocf:heartbeat:awseip

description

Resource options:
  awscli: command line tools for aws services
  elastic_ip (required): reserved elastic ip for ec2 instance
  allocation_id: reserved allocation id for ec2 instance
  private_ip_address: predefined private ip address for ec2 instance
  api_delay: a short delay between API calls, to avoid sending API too quick

Default operations:
  start: interval=0s timeout=10
  stop: interval=0s timeout=10
  monitor: interval=20 timeout=10

$ tail -n 4 /etc/sysconfig/pacemaker 
# VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions --gen-suppressions=all"

# pacemaker の ocf:heartbeat:awseip が aws コマンドを利用できるように REGION を設定
AWS_DEFAULT_REGION=ap-northeast-1

Start pacemaker

$ sudo service pacemaker start
Starting Pacemaker Cluster Manager                         [  OK  ]

# ElasticIPをリソースとして作成 "XXX.XXX.XXX.XXX" に ElasticIP を指定
$ sudo pcs resource create elastic-ip ocf:heartbeat:awseip \
    elastic_ip="XXX.XXX.XXX.XXX" awscli="$(which aws)" \
    op start   timeout="60s" interval="0s"  on-fail="stop" \
    op monitor timeout="60s" interval="10s" on-fail="restart" \
    op stop    timeout="60s" interval="0s"  on-fail="block"

$ sudo pcs resource show
 elastic-ip	(ocf::heartbeat:awseip):	Started ip-10-0-10-10

$ sudo pcs resource show elastic-ip
 Resource: elastic-ip (class=ocf provider=heartbeat type=awseip)
  Attributes: awscli=/usr/bin/aws elastic_ip=XXX.XXX.XXX.XXX
  Operations: monitor interval=10s on-fail=restart timeout=60s (elastic-ip-monitor-interval-10s)
              start interval=0s on-fail=stop timeout=60s (elastic-ip-start-interval-0s)
              stop interval=0s on-fail=block timeout=60s (elastic-ip-stop-interval-0s)

$ sudo crm_mon -1
Stack: corosync
Current DC: ip-10-0-20-10 (version 1.1.16-1.el6-94ff4df) - partition with quorum
Last updated: Fri Sep  1 07:01:33 2017
Last change: Fri Sep  1 06:25:10 2017 by root via crm_resource on ip-10-0-10-10

2 nodes configured
1 resource configured

Online: [ ip-10-0-10-10 ip-10-0-20-10 ]

Active resources:

 elastic-ip	(ocf::heartbeat:awseip):	Started ip-10-0-10-10

Failover ElasticIP

# 手動でリソースの移動を行う
$ sudo pcs resource move elastic-ip ip-10-0-20-10

# リソースが移動したのを確認する
$ sudo pcs resource show
 elastic-ip	(ocf::heartbeat:awseip):	Started ip-10-0-20-10

pcs status の UnicodeDecodeError

pcs status を実行したときに、 UnicodeDecodeError が出たので確認したところ、「'pcs status' produces a traceback ending with error "UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 32: ordinal not in range(128)" in a RHEL 6 High Availability cluster」すでに既知の問題のようなので、もう少し調べたところ、「remove import unicode_literals」まだリリースされていないですが、最新のレポジトリにbugfixが反映されていました。

$ sudo pcs status
Cluster name: 
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: ip-10-0-20-10 (version 1.1.16-1.el6-94ff4df) - partition with quorum
Last updated: Thu Aug 31 14:05:33 2017
Last change: Wed Aug 30 18:40:27 2017 by root via cibadmin on ip-10-0-20-10

2 nodes configured
1 resource configured

Online: [ ip-10-0-10-10 ip-10-0-20-10 ]

Full list of resources:

 elastic-ip	(ocf::heartbeat:awseip):	Started ip-10-0-20-10

Daemon Status:
Traceback (most recent call last):
  File "/usr/sbin/pcs", line 9, in <module>
    load_entry_point('pcs==0.9.156', 'console_scripts', 'pcs')()
  File "/usr/lib/python2.6/site-packages/pcs/app.py", line 191, in main
    cmd_map[command](argv)
  File "/usr/lib/python2.6/site-packages/pcs/status.py", line 24, in status_cmd
    full_status()
  File "/usr/lib/python2.6/site-packages/pcs/status.py", line 109, in full_status
    utils.serviceStatus("  ")
  File "/usr/lib/python2.6/site-packages/pcs/utils.py", line 2111, in serviceStatus
    running = is_service_running(cmd_runner(), service)
  File "/usr/lib/python2.6/site-packages/pcs/lib/external.py", line 259, in is_service_running
    [_service, service, "status"]
  File "/usr/lib/python2.6/site-packages/pcs/lib/external.py", line 436, in run
    out_err=out_err
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 22: ordinal not in range(128)

差分からパッチ作って編集して当てたらなおった

$ sudo pcs status
Cluster name: 
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: ip-10-0-20-10 (version 1.1.16-1.el6-94ff4df) - partition with quorum
Last updated: Fri Sep  1 06:28:05 2017
Last change: Fri Sep  1 06:25:10 2017 by root via crm_resource on ip-10-0-10-10

2 nodes configured
1 resource configured

Online: [ ip-10-0-10-10 ip-10-0-20-10 ]

Full list of resources:

 elastic-ip	(ocf::heartbeat:awseip):	Started ip-10-0-10-10

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: inactive/disabled

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up