CentOS7からcrmが使えなくなるのでpcsでzabbixサーバの冗長構成をがんばってみようという記録です。
監視サーバが長時間監視できなくって復旧に時間がかかるとか割とありえないというか、もし主系がしんでも待機系がすぐ生き返るようにしたいし深夜早朝に時間のかかる復旧に借り出されたら超いやだしミスると危ないから自動化不可避という認識です。
もはや手動運用でカバーとか筋肉運用とかムリなお年頃ですみません。寝ないと寿命がちぢみそう。健康超だいじ。
・ホスト名を付け直す(set-hostnameを指定すると再起動後も反映される)
# hostnamectl set-hostname monitor-mng-cent7-test-komi01
# hostnamectl set-hostname monitor-mng-cent7-test-komi02
(検証なのでひらきなおってrootでやってますがそうでない場合はsudoつけてください。)
・hostsをなおす
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.3.104 monitor-mng-cent7-test-komi02 mon7-2
192.168.3.103 monitor-mng-cent7-test-komi01 mon7-1 localhost
一行目になんか追記すると変な動きしてた。
localhostを自分の行(1行目以外)に書く必要があるかは確かめきれていないきがする。
・いちおう疎通確認
# ping mon7-1
# ping mon7-2
・クラスタ制御ソフトを入れる
# yum clean all
# yum install --enablerepo base pcs fence-agents-all
・勝手に入ったユーザのパスワードを更新(あとでノード登録のときにパスワードつかう)
# passwd hacluster
****
・pcsdを起動、有効化
# systemctl start pcsd
# systemctl enable pcsd
# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2016-07-12 01:26:04 UTC; 55s ago
Main PID: 3679 (pcsd)
CGroup: /system.slice/pcsd.service
├─3679 /bin/sh /usr/lib/pcsd/pcsd start
├─3683 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/...
└─3684 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
Jul 12 01:26:04 monitor-mng-cent7-test-komi01 systemd[1]: Starting PCS GUI and remote configuration i.....Jul 12 01:26:04 monitor-mng-cent7-test-komi01 systemd[1]: Started PCS GUI and remote configuration in...e.Hint: Some lines were ellipsized, use -l to show in full.
・ノードを認証する
# pcs cluster auth monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 -u hacluster -p ***** --force
monitor-mng-cent7-test-komi01: Authorized
monitor-mng-cent7-test-komi02: Authorized
・クラスタを作成する
片方で一回だけ
# pcs cluster setup --name zabbixcluster monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
monitor-mng-cent7-test-komi01: Succeeded
monitor-mng-cent7-test-komi02: Succeeded
Synchronizing pcsd certificates on nodes monitor-mng-cent7-test-komi01, monitor-mng-cent7-test-komi02...
monitor-mng-cent7-test-komi01: Success
monitor-mng-cent7-test-komi02: Success
Restaring pcsd on the nodes in order to reload the certificates...
monitor-mng-cent7-test-komi01: Success
monitor-mng-cent7-test-komi02: Success
・クラスタを開始させる
# pcs cluster start --all
monitor-mng-cent7-test-komi02: Starting Cluster...
monitor-mng-cent7-test-komi01: Starting Cluster...
・動作確認
# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 127.0.0.1
status = ring 0 active with no faults
"status"が"active"かつ"no faults"であれば、問題なく通信が行えている。
同様に、以下のコマンドでもcorosyncの動作状況を確認できる。
# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(127.0.0.1)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 monitor-mng-cent7-test-komi01 (local)
# ps aux|egrep -a 'PID|coro|pace'|grep -v grep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4206 1.0 3.7 190580 38432 ? Ssl 01:35 0:08 corosync
root 4221 0.0 0.7 130488 7240 ? Ss 01:35 0:00 /usr/sbin/pacemakerd -f
haclust+ 4222 0.0 1.3 131316 14016 ? Ss 01:35 0:00 /usr/libexec/pacemaker/cib
root 4223 0.0 0.6 132924 6952 ? Ss 01:35 0:00 /usr/libexec/pacemaker/stonithd
root 4224 0.0 0.4 102940 5000 ? Ss 01:35 0:00 /usr/libexec/pacemaker/lrmd
haclust+ 4225 0.0 0.6 124760 6720 ? Ss 01:35 0:00 /usr/libexec/pacemaker/attrd
haclust+ 4226 0.0 2.0 150972 20896 ? Ss 01:35 0:00 /usr/libexec/pacemaker/pengine
haclust+ 4227 0.0 1.0 184184 11068 ? Ss 01:35 0:00 /usr/libexec/pacemaker/crmd
# pcs status
Cluster name: zabbixcluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue Jul 12 01:50:28 2016 Last change: Tue Jul 12 01:35:24 2016 by hacluster via crmd on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi01 (version 1.1.13-10.el7_2.2-44eb2dd) - partition WITHOUT quorum
2 nodes and 0 resources configured
Node monitor-mng-cent7-test-komi02: UNCLEAN (offline)
Online: [ monitor-mng-cent7-test-komi01 ]
Full list of resources:
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
※上の括弧内のOnlineがlocalhostだけになってしまうのは冒頭のhostsのあたりに書いたとおりダメです。
・設定を確認
# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
・stonithを無効にする(上記errorメッセージを消す)
pcs property set stonith-enabled=false
crm_verify -L -V
・リソース追加例(仮想IPの場合)
リソースID:VIP
リソースエージェント(RA):"ocf:heartbeat:IPaddr2"
IPアドレス:192.168.56.10/24(HostOnlyネットワークアドレス)
監視間隔:10sec
# pcs resource create VIP ocf:heartbeat:IPaddr2 \
ip=192.168.56.10 cidr_netmask=24 op monitor interval=10s
こんなかんじぽい。
のでこれを参考にあとでzabbix入れなおしてからそのリソースを追加する。
とりあえずなんとなくzabbixいれなおす。
そのまえにMySQLクライアント入れる。
・mysqlクライアントをいれる(本体データはRDSで保管するのでクライアントだけ入れとく)
※いままでの入れ方はchefとの相性が悪かったのでコミュニティのリポジトリを採用してクライアント入れることにしようかと。
RDSはmariaは使えない気がするのでライブラリを消す(CentOS7からmariaさんのlibが標準で入っている模様)
yum remove mariadb-libs
==========================================================================================================
Package Arch Version Repository Size
==========================================================================================================
Removing:
mariadb-libs x86_64 1:5.5.44-2.el7.centos installed 4.4 M
Removing for dependencies:
php-mysql x86_64 5.4.16-36.1.el7_2.1 @updates 232 k
postfix x86_64 2:2.10.1-6.el7 installed 12 M
zabbix-server-mysql x86_64 3.0.3-1.el7 @zabbix 3.3 M
zabbix-web noarch 3.0.3-1.el7 @zabbix 29 M
zabbix-web-japanese noarch 3.0.3-1.el7 @zabbix 0.0
zabbix-web-mysql noarch 3.0.3-1.el7 @zabbix 0.0
Transaction Summary
==========================================================================================================
Remove 1 Package (+6 Dependent packages)
先にzabbixを入れてしまうと依存で丸ごと消されるので注意。手順はzabbixより*db-libs入れ替えを真っ先にしたほうがいい。
rm -rf /var/lib/mysql/
yum localinstall http://dev.mysql.com/get/mysql57-community-release-el7-7.noarch.rpm
# yum info mysql-community-server
Name : mysql-community-server
Arch : x86_64
Version : 5.7.13
Release : 1.el7
Size : 151 M
Repo : mysql57-community/x86_64
Summary : A very fast and reliable SQL database server
URL : http://www.mysql.com/
License : Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Under GPLv2
: license as shown in the Description field.
Description : The MySQL(TM) software delivers a very fast, multi-threaded, multi-user,
: and robust SQL (Structured Query Language) database server. MySQL Server
: is intended for mission-critical, heavy-load production systems as well
: as for embedding into mass-deployed software. MySQL is a trademark of
: Oracle and/or its affiliates
:
: The MySQL software has Dual Licensing, which means you can use the MySQL
: software free of charge under the GNU General Public License
: (http://www.gnu.org/licenses/). You can also purchase commercial MySQL
: licenses from Oracle and/or its affiliates if you do not wish to be bound by the terms of
: the GPL. See the chapter "Licensing and Support" in the manual for
: further info.
:
: The MySQL web site (http://www.mysql.com/) provides the latest news and
: information about the MySQL software. Also please see the documentation
: and the manual for more information.
:
: This package includes the MySQL server binary as well as related utilities
: to run and administer a MySQL server.
# yum install mysql-community-client mysql-community-libs mysql-community-devel mysql-community-common
# vi /etc/my.cnf
[mysqldump]
quick
max_allowed_packet = 16M
default-character-set = utf8
[mysql]
no-auto-rehash
default-character-set = utf8
・confrictしたのでzabbix2.2を削除
yum remove zabbix-relase zabbix-agent zabbix
rpm -e zabbix-release
rpm -qa|grep zabbix
・zabbix3.0のリポジトリ登録
rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-release-3.0-1.el7.noarch.rpm
・本体インストール
yum -y install zabbix-server-mysql zabbix-web-mysql zabbix-web-japanese
==========================================================================================================
Package Arch Version Repository Size
==========================================================================================================
Installing:
zabbix-server-mysql x86_64 3.0.3-1.el7 zabbix 1.7 M
zabbix-web-japanese noarch 3.0.3-1.el7 zabbix 4.4 k
zabbix-web-mysql noarch 3.0.3-1.el7 zabbix 3.9 k
Installing for dependencies:
OpenIPMI-libs x86_64 2.0.19-11.el7 base 501 k
dejavu-fonts-common noarch 2.33-6.el7 base 64 k
dejavu-sans-fonts noarch 2.33-6.el7 base 1.4 M
fontpackages-filesystem noarch 1.44-8.el7 base 9.9 k
fping x86_64 3.10-1.el7 zabbix-non-supported 40 k
httpd x86_64 2.4.6-40.el7.centos.1 updates 2.7 M
httpd-tools x86_64 2.4.6-40.el7.centos.1 updates 82 k
iksemel x86_64 1.4-2.el7.centos zabbix-non-supported 49 k
libXpm x86_64 3.5.11-3.el7 base 54 k
libzip x86_64 0.10.1-8.el7 base 48 k
php x86_64 5.4.16-36.1.el7_2.1 updates 1.4 M
php-bcmath x86_64 5.4.16-36.1.el7_2.1 updates 56 k
php-cli x86_64 5.4.16-36.1.el7_2.1 updates 2.7 M
php-common x86_64 5.4.16-36.1.el7_2.1 updates 563 k
php-gd x86_64 5.4.16-36.1.el7_2.1 updates 126 k
php-ldap x86_64 5.4.16-36.1.el7_2.1 updates 51 k
php-mbstring x86_64 5.4.16-36.1.el7_2.1 updates 503 k
php-mysql x86_64 5.4.16-36.1.el7_2.1 updates 99 k
php-pdo x86_64 5.4.16-36.1.el7_2.1 updates 97 k
php-xml x86_64 5.4.16-36.1.el7_2.1 updates 124 k
t1lib x86_64 5.1.2-14.el7 base 166 k
unixODBC x86_64 2.3.1-11.el7 base 413 k
vlgothic-p-fonts noarch 20130607-2.el7 base 2.2 M
zabbix-web noarch 3.0.3-1.el7 zabbix 3.5 M
Transaction Summary
==========================================================================================================
Install 3 Packages (+24 Dependent packages)
Installed:
zabbix-server-mysql.x86_64 0:3.0.3-1.el7 zabbix-web-japanese.noarch 0:3.0.3-1.el7 zabbix-web-mysql.noarch 0:3.0.3-1.el7
Dependency Installed:
OpenIPMI-libs.x86_64 0:2.0.19-11.el7 dejavu-fonts-common.noarch 0:2.33-6.el7
dejavu-sans-fonts.noarch 0:2.33-6.el7 fontpackages-filesystem.noarch 0:1.44-8.el7
fping.x86_64 0:3.10-4.el7 httpd.x86_64 0:2.4.6-40.el7.centos.1
httpd-tools.x86_64 0:2.4.6-40.el7.centos.1 iksemel.x86_64 0:1.4-6.el7
libXpm.x86_64 0:3.5.11-3.el7 libzip.x86_64 0:0.10.1-8.el7
php.x86_64 0:5.4.16-36.1.el7_2.1 php-bcmath.x86_64 0:5.4.16-36.1.el7_2.1
php-cli.x86_64 0:5.4.16-36.1.el7_2.1 php-common.x86_64 0:5.4.16-36.1.el7_2.1
php-gd.x86_64 0:5.4.16-36.1.el7_2.1 php-ldap.x86_64 0:5.4.16-36.1.el7_2.1
php-mbstring.x86_64 0:5.4.16-36.1.el7_2.1 php-mysql.x86_64 0:5.4.16-36.1.el7_2.1
php-pdo.x86_64 0:5.4.16-36.1.el7_2.1 php-xml.x86_64 0:5.4.16-36.1.el7_2.1
t1lib.x86_64 0:5.1.2-14.el7 unixODBC.x86_64 0:2.3.1-11.el7
vlgothic-p-fonts.noarch 0:20130607-2.el7 zabbix-web.noarch 0:3.0.3-1.el7
Complete!
・agentそのたインストール
# yum -y install zabbix-agent
Running transaction
Installing : zabbix-agent-3.0.3-1.el7.x86_64
1/1 Verifying : zabbix-agent-3.0.3-1.el7.x86_64
1/1
Installed:
zabbix-agent.x86_64 0:3.0.3-1.el7
# yum -y install zabbix-get
Installed:
zabbix-get.x86_64 0:3.0.3-1.el7
・rdsの準備とzabbix用データベースの作成
idとpassとschemaとhost情報を控えておく(zabbix-server.confにあとでかく)
入れたmysqlクライアントで接続確認しておくのとzabbixに同梱のsql実行する
mysql -u mn***_op -p -h monitor-test.****.rds.amazonaws.com
cd /usr/share/doc/zabbix-server-mysql-3.0.3/
zcat create.sql.gz | mysql -u mn***_op -p -h monitor-test.****.rds.amazonaws.com zabbix
・zabbix-server.confほかの設定をする
# vim /etc/zabbix/zabbix_server.conf
DBHostほかを修正
# vim /etc/httpd/conf.d/zabbix.conf
タイムゾーンをAsia/Tokyoに修正
# vi /etc/httpd/conf.d/status.conf
ExtendedStatus on
<Location /server-status>
SetHandler server-status
Require ip 127.0.0.1
</Location>
ocfのWEBServer用に必要な拡張ステータスをみれるようにするやつ
・URLアクセスして初期設定
DB情報を入力してそのまますすむ
・リソースを登録するために既存の情報をとってくる
$ sudo crm configure show
postfixはActActにしてHAリソースからはずし、そこをapache用に変えるよてい。
仮想IPをパブリッククラウド上でフェイルオーバさせるのはしない。手前にELBをかます方式でいく。
primitive p_zabbix ocf:heartbeat:zabbixserver \
params binary="/usr/sbin/zabbix_server" pid="/var/run/zabbix/zabbix_server.pid" \
op start interval="0" timeout="20s" \
op monitor interval="10s" timeout="20s" \
op stop interval="0" timeout="20s" \
meta target-role="Started"
group grpZabbixServer p_zabbix p_postfix
property $id="cib-bootstrap-options" \
dc-version="1.0.13-30bb726" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1466682065"
・リソース登録をする
↓ocfだとリソースエージェントがないといわれてしまうのでとりあえずzabbixはsystemdをつかうことに。
pcs resource create p_zabbix ocf:heartbeat:zabbixserver \
params binary="/usr/sbin/zabbix_server" pid="/var/run/zabbix/zabbix_server.pid" \
op monitor interval="10s" timeout="20s" \
op stop interval="0" timeout="20s" \
meta target-role="Started"
Error: Unable to create resource 'ocf:heartbeat:zabbixserver', it is not installed on this system (use --force to override)
# pcs resource show
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Stopped
WebSite (ocf::heartbeat:apache): Stopped
zabbixはsystemdでがんばる
pcs resource create web_server systemd:httpd op monitor interval=10s
pcs resource group add grpZabbixServer web_server
pcs constraint colocation add web_server zabbix_server INFINITY
# pcs resource show
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi01
・ステータス確認
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 06:39:36 2016 Last change: Tue Jul 12 06:39:17 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi01 (version 1.1.13-10.el7_2.2-44eb2dd) - partition WITHOUT quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi01 ]
OFFLINE: [ monitor-mng-cent7-test-komi02 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
WebSite (ocf::heartbeat:apache): Stopped
Failed Actions:
* WebSite_start_0 on monitor-mng-cent7-test-komi01 'unknown error' (1): call=13, status=Timed Out, exitreason='none',
last-rc-change='Tue Jul 12 06:39:09 2016', queued=0ms, exec=40002ms
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
・configを確認
# pcs config show
Cluster Name: zabbixcluster
Corosync Nodes:
monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02
Pacemaker Nodes:
monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02
Resources:
Group: grpZabbixServer
Resource: zabbix_server (class=systemd type=zabbix-server)
Operations: monitor interval=10s (zabbix_server-monitor-interval-10s)
Resource: WebSite (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
Operations: start interval=0s timeout=40s (WebSite-start-interval-0s)
stop interval=0s timeout=60s (WebSite-stop-interval-0s)
monitor interval=1min (WebSite-monitor-interval-1min)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
WebSite with zabbix_server (score:INFINITY) (id:colocation-WebSite-zabbix_server-INFINITY)
Resources Defaults:
resource-stickiness: INFINITY
migration-threshold: 1
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: zabbixcluster
dc-version: 1.1.13-10.el7_2.2-44eb2dd
default-action-timeout: 240
default-resource-stickiness: 200
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(127.0.0.1) ←ここがまずいっぽい
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
どうもhostsの一行目にクラスタ登録に指定したホスト名を書くのがまずいっぽい
# vi /etc/hosts
127.0.0.1 monitor-mng-cent7-test-komi01 localhost localhost.localdomain localhost4 localhost4.localdomain4
↓
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
なおして
# pcs cluster stop --all
# pcs cluster start --all
# pcs status
これを繰り返したところ直ったしOnlineが両方になった模様。
# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.3.103)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.3.104)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
ちなみに起動中に
/etc/corosync/corosync.conf
のノード名を変えたりするとpcs cluster stop
ができなくなる模様。
apache側もocfだとステータスにフェイルでていた
Failed Actions:
* WebSite_start_0 on monitor-mng-cent7-test-komi01 'unknown error' (1): call=12, status=Timed Out, exitreason='none',
last-rc-change='Tue Jul 12 07:27:47 2016', queued=0ms, exec=40003ms
とりあえずこれ(Fail)がでていて切り替わることはありえない。
・cibを確認してみる
# pcs cluster cib
出力結果は長いので省略
# cibadmin --query
でも同じの出る。
・FailCountのクリア方法
# pcs resource cleanup WebSite
Waiting for 4 replies from the CRMd.... OK
Cleaning up zabbix_server on monitor-mng-cent7-test-komi01, removing fail-count-zabbix_server
Cleaning up zabbix_server on monitor-mng-cent7-test-komi02, removing fail-count-zabbix_server
Cleaning up WebSite on monitor-mng-cent7-test-komi01, removing fail-count-WebSite
Cleaning up WebSite on monitor-mng-cent7-test-komi02, removing fail-count-WebSite
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 08:52:30 2016 Last change: Tue Jul 12 08:52:16 2016 by hacluster via crmd on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi02
WebSite (ocf::heartbeat:apache): Started monitor-mng-cent7-test-komi02
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
いい感じにクリアされた。
・ここから切り替えテスト(長い)
以下のようにしてみても全く移動しなかった。
# pcs resource move zabbix_server monitor-mng-cent7-test-komi01
2側をスタンバイにするコマンドを打ってみたところリソースが移った
# pcs cluster standby monitor-mng-cent7-test-komi02
# pcs cluster unstandby --all
とすると01側でwebが落ちてる扱いのとき、02側にリソースが移動して全リソースstartする
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 09:25:28 2016 Last change: Tue Jul 12 09:23:35 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi02
WebSite (ocf::heartbeat:apache): Started monitor-mng-cent7-test-komi02
Failed Actions:
* WebSite_start_0 on monitor-mng-cent7-test-komi01 'unknown error' (1): call=42, status=Timed Out, exitreason='none',
last-rc-change='Tue Jul 12 09:21:27 2016', queued=0ms, exec=40002ms
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
FailedActionでてるのはocfがlocalhostのServer-statusとかみてたため。
確かにserver-statusをcurlでみると404なのでそのせいだった模様。
ステータスまで見てくれなくていいと思い日和ってsystemdに変えました。
curl http://localhost/server-status
リソースを消してsystemdのリソースを登録してみる
# pcs resource delete WebSite
# pcs resource show
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
apacheもocfが難しいのでsystemdでがんばる。systemdだとリソースつかんでる間にサービス再起動とかできなくなるのでいい感じに思われる。
pcs resource create web_server systemd:httpd op monitor interval=10s
pcs resource group add grpZabbixServer web_server
pcs constraint colocation add web_server zabbix_server INFINITY
# pcs resource show
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi01
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:05:04 2016 Last change: Tue Jul 12 10:02:54 2016 by root via cibadmin on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Node monitor-mng-cent7-test-komi02: standby
Online: [ monitor-mng-cent7-test-komi01 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi01
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
・スタンバイ解除
# pcs cluster unstandby --all
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:05:52 2016 Last change: Tue Jul 12 10:05:50 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi01
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
01側:
# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1020/sshd
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 8723/zabbix_agentd
tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN 19346/zabbix_server
tcp6 0 0 :::80 :::* LISTEN 20467/httpd
tcp6 0 0 :::2224 :::* LISTEN 4141/ruby
tcp6 0 0 :::22 :::* LISTEN 1020/sshd
tcp6 0 0 :::10050 :::* LISTEN 8723/zabbix_agentd
tcp6 0 0 :::10051 :::* LISTEN 19346/zabbix_server
02側:
# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1026/sshd
tcp6 0 0 :::2224 :::* LISTEN 4268/ruby
tcp6 0 0 :::22 :::* LISTEN 1026/sshd
このほうがよさげ。
ocfだとpcsでstopでもnetstatでみると起動してるかんじだった(仮想IPも一緒に移すわけでなくELBつかう前提なので、データ重複しそうであぶない挙動な気がした)。
ちょっと切り替えてみる
# pcs cluster stop monitor-mng-cent7-test-komi01
monitor-mng-cent7-test-komi01: Stopping Cluster (pacemaker)...
monitor-mng-cent7-test-komi01: Stopping Cluster (corosync)...
無事にリソースが移動。
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:18:01 2016 Last change: Tue Jul 12 10:05:50 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi02 ]
OFFLINE: [ monitor-mng-cent7-test-komi01 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi02
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi02
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
落としたほうを起動してみて、ステータス確認(起動だけでリソースが移動しないことの確認)
# pcs cluster start monitor-mng-cent7-test-komi01
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:19:19 2016 Last change: Tue Jul 12 10:05:50 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi02
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi02
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
とくに問題なさそう。
[root@monitor-mng-cent7-test-komi02 zabbix-server-mysql-3.0.3]# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1026/sshd
tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN 21196/zabbix_server
tcp6 0 0 :::80 :::* LISTEN 21236/httpd
tcp6 0 0 :::2224 :::* LISTEN 4268/ruby
tcp6 0 0 :::22 :::* LISTEN 1026/sshd
tcp6 0 0 :::10051 :::* LISTEN 21196/zabbix_server
[root@monitor-mng-cent7-test-komi01 yum.repos.d]# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1020/sshd
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 8723/zabbix_agentd
tcp6 0 0 :::2224 :::* LISTEN 4141/ruby
tcp6 0 0 :::22 :::* LISTEN 1020/sshd
tcp6 0 0 :::10050 :::* LISTEN 8723/zabbix_agentd
では、02をstandbyにしてリソースを移動してみる
# pcs cluster standby monitor-mng-cent7-test-komi02
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:22:36 2016 Last change: Tue Jul 12 10:22:23 2016 by root via crm_attribute on monitor-mng-cent7-test-komi02
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Node monitor-mng-cent7-test-komi02: standby
Online: [ monitor-mng-cent7-test-komi01 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi01
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
# pcs cluster unstandby --all
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:24:34 2016 Last change: Tue Jul 12 10:24:31 2016 by root via crm_attribute on monitor-mng-cent7-test-komi02
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]
Full list of resources:
Resource Group: grpZabbixServer
zabbix_server (systemd:zabbix-server): Started monitor-mng-cent7-test-komi01
web_server (systemd:httpd): Started monitor-mng-cent7-test-komi01
PCSD Status:
monitor-mng-cent7-test-komi01: Online
monitor-mng-cent7-test-komi02: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
ひとまず手動切り替えテスト完了。Failのクリアの手順も特定できた。
(これをchefにしないとなー。。)
・ログまわり
/var/log/cluster/corosync.log
にそれらしいのが存在する
特定の時間分取り出す方法
# pcs cluster report --from "2016-07-12 09:00:00" /tmp/dest
# ll /tmp
total 60
-rw-r--r-- 1 root root 54519 Jul 12 10:35 dest.tar.bz2
こんなかんじででてくる。以下のようにtoの指定も可能。
report [--from "YYYY-M-D H:M:S" [--to "YYYY-M-D" H:M:S"]] dest
・フェイル回数を特定
# pcs resource failcount show zabbix_server
No failcounts for zabbix_server
リアルタイムに様子をみるには
watch pcs status
とかするしかなさそう(helpを眺めてみて)
とりあえず以上、以下参考。
参考:
http://kan3aa.hatenablog.com/entry/2015/06/05/135150
https://blog.apar.jp/zabbix/4177/
http://weblabo.oscasierra.net/installing-mysql57-centos7-yum/
http://doruby.kbmj.com/taka/20131018/yum_update_crm_pcs_
http://infra.blog.shinobi.jp/Entry/107/
http://blog.matsumoto-r.jp/?p=3482
https://ericsysmin.com/2016/02/18/configuring-high-availability-ha-zabbix-server-on-centos-7/
https://gist.github.com/kogoto/92eef35b28ae632a11f7
# pcs cluster --help
http://yomon.hatenablog.com/entry/2016/04/13/110035
http://qiita.com/tukiyo3/items/27dcc7f15a8b4e4cf8a5
# pcs resource --help
https://access.redhat.com/documentation/ja-JP/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ch-clustresources-HAAR.html