LoginSignup
6
7

More than 5 years have passed since last update.

zabbix-server+pcs+corosyncでHAクラスタを構築する検証

Last updated at Posted at 2016-07-22

CentOS7からcrmが使えなくなるのでpcsでzabbixサーバの冗長構成をがんばってみようという記録です。
監視サーバが長時間監視できなくって復旧に時間がかかるとか割とありえないというか、もし主系がしんでも待機系がすぐ生き返るようにしたいし深夜早朝に時間のかかる復旧に借り出されたら超いやだしミスると危ないから自動化不可避という認識です。
もはや手動運用でカバーとか筋肉運用とかムリなお年頃ですみません。寝ないと寿命がちぢみそう。健康超だいじ。

・ホスト名を付け直す(set-hostnameを指定すると再起動後も反映される)

# hostnamectl set-hostname monitor-mng-cent7-test-komi01
# hostnamectl set-hostname monitor-mng-cent7-test-komi02

(検証なのでひらきなおってrootでやってますがそうでない場合はsudoつけてください。)

・hostsをなおす

# vi /etc/hosts
127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.3.104   monitor-mng-cent7-test-komi02 mon7-2
192.168.3.103   monitor-mng-cent7-test-komi01 mon7-1 localhost

一行目になんか追記すると変な動きしてた。
localhostを自分の行(1行目以外)に書く必要があるかは確かめきれていないきがする。

・いちおう疎通確認

# ping mon7-1
# ping mon7-2

・クラスタ制御ソフトを入れる

# yum clean all
# yum install --enablerepo base pcs fence-agents-all

・勝手に入ったユーザのパスワードを更新(あとでノード登録のときにパスワードつかう)

# passwd hacluster
****

・pcsdを起動、有効化

# systemctl start pcsd
# systemctl enable pcsd

# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-07-12 01:26:04 UTC; 55s ago
 Main PID: 3679 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─3679 /bin/sh /usr/lib/pcsd/pcsd start
           ├─3683 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/...
           └─3684 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Jul 12 01:26:04 monitor-mng-cent7-test-komi01 systemd[1]: Starting PCS GUI and remote configuration i.....Jul 12 01:26:04 monitor-mng-cent7-test-komi01 systemd[1]: Started PCS GUI and remote configuration in...e.Hint: Some lines were ellipsized, use -l to show in full.

・ノードを認証する

# pcs cluster auth monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 -u hacluster -p ***** --force
monitor-mng-cent7-test-komi01: Authorized
monitor-mng-cent7-test-komi02: Authorized

・クラスタを作成する
片方で一回だけ

# pcs cluster setup --name zabbixcluster monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
monitor-mng-cent7-test-komi01: Succeeded
monitor-mng-cent7-test-komi02: Succeeded
Synchronizing pcsd certificates on nodes monitor-mng-cent7-test-komi01, monitor-mng-cent7-test-komi02...
monitor-mng-cent7-test-komi01: Success
monitor-mng-cent7-test-komi02: Success

Restaring pcsd on the nodes in order to reload the certificates...
monitor-mng-cent7-test-komi01: Success
monitor-mng-cent7-test-komi02: Success

・クラスタを開始させる

# pcs cluster start --all
monitor-mng-cent7-test-komi02: Starting Cluster...
monitor-mng-cent7-test-komi01: Starting Cluster...

・動作確認

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 127.0.0.1
        status  = ring 0 active with no faults

"status"が"active"かつ"no faults"であれば、問題なく通信が行えている。
同様に、以下のコマンドでもcorosyncの動作状況を確認できる。

# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(127.0.0.1) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined

# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 monitor-mng-cent7-test-komi01 (local)

# ps aux|egrep -a 'PID|coro|pace'|grep -v grep
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      4206  1.0  3.7 190580 38432 ?        Ssl  01:35   0:08 corosync
root      4221  0.0  0.7 130488  7240 ?        Ss   01:35   0:00 /usr/sbin/pacemakerd -f
haclust+  4222  0.0  1.3 131316 14016 ?        Ss   01:35   0:00 /usr/libexec/pacemaker/cib
root      4223  0.0  0.6 132924  6952 ?        Ss   01:35   0:00 /usr/libexec/pacemaker/stonithd
root      4224  0.0  0.4 102940  5000 ?        Ss   01:35   0:00 /usr/libexec/pacemaker/lrmd
haclust+  4225  0.0  0.6 124760  6720 ?        Ss   01:35   0:00 /usr/libexec/pacemaker/attrd
haclust+  4226  0.0  2.0 150972 20896 ?        Ss   01:35   0:00 /usr/libexec/pacemaker/pengine
haclust+  4227  0.0  1.0 184184 11068 ?        Ss   01:35   0:00 /usr/libexec/pacemaker/crmd


# pcs status
Cluster name: zabbixcluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue Jul 12 01:50:28 2016          Last change: Tue Jul 12 01:35:24 2016 by hacluster via crmd on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi01 (version 1.1.13-10.el7_2.2-44eb2dd) - partition WITHOUT quorum
2 nodes and 0 resources configured

Node monitor-mng-cent7-test-komi02: UNCLEAN (offline)
Online: [ monitor-mng-cent7-test-komi01 ]

Full list of resources:


PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

※上の括弧内のOnlineがlocalhostだけになってしまうのは冒頭のhostsのあたりに書いたとおりダメです。

・設定を確認

# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

・stonithを無効にする(上記errorメッセージを消す)

pcs property set stonith-enabled=false
crm_verify -L -V

・リソース追加例(仮想IPの場合)

リソースID:VIP
リソースエージェント(RA):"ocf:heartbeat:IPaddr2"
IPアドレス:192.168.56.10/24(HostOnlyネットワークアドレス)
監視間隔:10sec

# pcs resource create VIP ocf:heartbeat:IPaddr2 \
    ip=192.168.56.10 cidr_netmask=24 op monitor interval=10s

こんなかんじぽい。
のでこれを参考にあとでzabbix入れなおしてからそのリソースを追加する。

とりあえずなんとなくzabbixいれなおす。
そのまえにMySQLクライアント入れる。

・mysqlクライアントをいれる(本体データはRDSで保管するのでクライアントだけ入れとく)

※いままでの入れ方はchefとの相性が悪かったのでコミュニティのリポジトリを採用してクライアント入れることにしようかと。

RDSはmariaは使えない気がするのでライブラリを消す(CentOS7からmariaさんのlibが標準で入っている模様)

yum remove mariadb-libs
==========================================================================================================
 Package                       Arch             Version                         Repository           Size
==========================================================================================================
Removing:
 mariadb-libs                  x86_64           1:5.5.44-2.el7.centos           installed           4.4 M
Removing for dependencies:
 php-mysql                     x86_64           5.4.16-36.1.el7_2.1             @updates            232 k
 postfix                       x86_64           2:2.10.1-6.el7                  installed            12 M
 zabbix-server-mysql           x86_64           3.0.3-1.el7                     @zabbix             3.3 M
 zabbix-web                    noarch           3.0.3-1.el7                     @zabbix              29 M
 zabbix-web-japanese           noarch           3.0.3-1.el7                     @zabbix             0.0  
 zabbix-web-mysql              noarch           3.0.3-1.el7                     @zabbix             0.0  

Transaction Summary
==========================================================================================================
Remove  1 Package (+6 Dependent packages)

先にzabbixを入れてしまうと依存で丸ごと消されるので注意。手順はzabbixより*db-libs入れ替えを真っ先にしたほうがいい。

rm -rf /var/lib/mysql/

yum localinstall http://dev.mysql.com/get/mysql57-community-release-el7-7.noarch.rpm

# yum info mysql-community-server

Name        : mysql-community-server
Arch        : x86_64
Version     : 5.7.13
Release     : 1.el7
Size        : 151 M
Repo        : mysql57-community/x86_64
Summary     : A very fast and reliable SQL database server
URL         : http://www.mysql.com/
License     : Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Under GPLv2
            : license as shown in the Description field.
Description : The MySQL(TM) software delivers a very fast, multi-threaded, multi-user,
            : and robust SQL (Structured Query Language) database server. MySQL Server
            : is intended for mission-critical, heavy-load production systems as well
            : as for embedding into mass-deployed software. MySQL is a trademark of
            : Oracle and/or its affiliates
            : 
            : The MySQL software has Dual Licensing, which means you can use the MySQL
            : software free of charge under the GNU General Public License
            : (http://www.gnu.org/licenses/). You can also purchase commercial MySQL
            : licenses from Oracle and/or its affiliates if you do not wish to be bound by the terms of
            : the GPL. See the chapter "Licensing and Support" in the manual for
            : further info.
            : 
            : The MySQL web site (http://www.mysql.com/) provides the latest news and
            : information about the MySQL software.  Also please see the documentation
            : and the manual for more information.
            : 
            : This package includes the MySQL server binary as well as related utilities
            : to run and administer a MySQL server.


# yum install mysql-community-client mysql-community-libs mysql-community-devel mysql-community-common

# vi /etc/my.cnf
[mysqldump]
quick
max_allowed_packet = 16M
default-character-set = utf8

[mysql]
no-auto-rehash
default-character-set = utf8

・confrictしたのでzabbix2.2を削除

yum remove zabbix-relase zabbix-agent zabbix
rpm -e zabbix-release
rpm -qa|grep zabbix

・zabbix3.0のリポジトリ登録

rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-release-3.0-1.el7.noarch.rpm

・本体インストール

yum -y install zabbix-server-mysql zabbix-web-mysql zabbix-web-japanese
==========================================================================================================
 Package                       Arch         Version                      Repository                  Size
==========================================================================================================
Installing:
 zabbix-server-mysql           x86_64       3.0.3-1.el7                  zabbix                     1.7 M
 zabbix-web-japanese           noarch       3.0.3-1.el7                  zabbix                     4.4 k
 zabbix-web-mysql              noarch       3.0.3-1.el7                  zabbix                     3.9 k
Installing for dependencies:
 OpenIPMI-libs                 x86_64       2.0.19-11.el7                base                       501 k
 dejavu-fonts-common           noarch       2.33-6.el7                   base                        64 k
 dejavu-sans-fonts             noarch       2.33-6.el7                   base                       1.4 M
 fontpackages-filesystem       noarch       1.44-8.el7                   base                       9.9 k
 fping                         x86_64       3.10-1.el7                   zabbix-non-supported        40 k
 httpd                         x86_64       2.4.6-40.el7.centos.1        updates                    2.7 M
 httpd-tools                   x86_64       2.4.6-40.el7.centos.1        updates                     82 k
 iksemel                       x86_64       1.4-2.el7.centos             zabbix-non-supported        49 k
 libXpm                        x86_64       3.5.11-3.el7                 base                        54 k
 libzip                        x86_64       0.10.1-8.el7                 base                        48 k
 php                           x86_64       5.4.16-36.1.el7_2.1          updates                    1.4 M
 php-bcmath                    x86_64       5.4.16-36.1.el7_2.1          updates                     56 k
 php-cli                       x86_64       5.4.16-36.1.el7_2.1          updates                    2.7 M
 php-common                    x86_64       5.4.16-36.1.el7_2.1          updates                    563 k
 php-gd                        x86_64       5.4.16-36.1.el7_2.1          updates                    126 k
 php-ldap                      x86_64       5.4.16-36.1.el7_2.1          updates                     51 k
 php-mbstring                  x86_64       5.4.16-36.1.el7_2.1          updates                    503 k
 php-mysql                     x86_64       5.4.16-36.1.el7_2.1          updates                     99 k
 php-pdo                       x86_64       5.4.16-36.1.el7_2.1          updates                     97 k
 php-xml                       x86_64       5.4.16-36.1.el7_2.1          updates                    124 k
 t1lib                         x86_64       5.1.2-14.el7                 base                       166 k
 unixODBC                      x86_64       2.3.1-11.el7                 base                       413 k
 vlgothic-p-fonts              noarch       20130607-2.el7               base                       2.2 M
 zabbix-web                    noarch       3.0.3-1.el7                  zabbix                     3.5 M

Transaction Summary
==========================================================================================================
Install  3 Packages (+24 Dependent packages)

Installed:
  zabbix-server-mysql.x86_64 0:3.0.3-1.el7       zabbix-web-japanese.noarch 0:3.0.3-1.el7        zabbix-web-mysql.noarch 0:3.0.3-1.el7         

Dependency Installed:
  OpenIPMI-libs.x86_64 0:2.0.19-11.el7          dejavu-fonts-common.noarch 0:2.33-6.el7       
  dejavu-sans-fonts.noarch 0:2.33-6.el7         fontpackages-filesystem.noarch 0:1.44-8.el7   
  fping.x86_64 0:3.10-4.el7                     httpd.x86_64 0:2.4.6-40.el7.centos.1          
  httpd-tools.x86_64 0:2.4.6-40.el7.centos.1    iksemel.x86_64 0:1.4-6.el7                    
  libXpm.x86_64 0:3.5.11-3.el7                  libzip.x86_64 0:0.10.1-8.el7                  
  php.x86_64 0:5.4.16-36.1.el7_2.1              php-bcmath.x86_64 0:5.4.16-36.1.el7_2.1       
  php-cli.x86_64 0:5.4.16-36.1.el7_2.1          php-common.x86_64 0:5.4.16-36.1.el7_2.1       
  php-gd.x86_64 0:5.4.16-36.1.el7_2.1           php-ldap.x86_64 0:5.4.16-36.1.el7_2.1         
  php-mbstring.x86_64 0:5.4.16-36.1.el7_2.1     php-mysql.x86_64 0:5.4.16-36.1.el7_2.1        
  php-pdo.x86_64 0:5.4.16-36.1.el7_2.1          php-xml.x86_64 0:5.4.16-36.1.el7_2.1          
  t1lib.x86_64 0:5.1.2-14.el7                   unixODBC.x86_64 0:2.3.1-11.el7                
  vlgothic-p-fonts.noarch 0:20130607-2.el7      zabbix-web.noarch 0:3.0.3-1.el7               

Complete!

・agentそのたインストール

# yum -y install zabbix-agent
Running transaction
  Installing : zabbix-agent-3.0.3-1.el7.x86_64                                                
        1/1   Verifying  : zabbix-agent-3.0.3-1.el7.x86_64                                    
                    1/1 
Installed:
  zabbix-agent.x86_64 0:3.0.3-1.el7 

# yum -y install zabbix-get

Installed:
  zabbix-get.x86_64 0:3.0.3-1.el7     

・rdsの準備とzabbix用データベースの作成

idとpassとschemaとhost情報を控えておく(zabbix-server.confにあとでかく)
入れたmysqlクライアントで接続確認しておくのとzabbixに同梱のsql実行する

mysql -u mn***_op -p -h monitor-test.****.rds.amazonaws.com

cd /usr/share/doc/zabbix-server-mysql-3.0.3/
zcat create.sql.gz | mysql -u mn***_op -p -h monitor-test.****.rds.amazonaws.com zabbix

・zabbix-server.confほかの設定をする

# vim /etc/zabbix/zabbix_server.conf

DBHostほかを修正

# vim /etc/httpd/conf.d/zabbix.conf

タイムゾーンをAsia/Tokyoに修正

# vi /etc/httpd/conf.d/status.conf
ExtendedStatus on
<Location /server-status>
SetHandler server-status
Require ip 127.0.0.1
</Location>

ocfのWEBServer用に必要な拡張ステータスをみれるようにするやつ

・URLアクセスして初期設定

DB情報を入力してそのまますすむ

・リソースを登録するために既存の情報をとってくる

$ sudo crm configure show

postfixはActActにしてHAリソースからはずし、そこをapache用に変えるよてい。
仮想IPをパブリッククラウド上でフェイルオーバさせるのはしない。手前にELBをかます方式でいく。

primitive p_zabbix ocf:heartbeat:zabbixserver \
        params binary="/usr/sbin/zabbix_server" pid="/var/run/zabbix/zabbix_server.pid" \
        op start interval="0" timeout="20s" \
        op monitor interval="10s" timeout="20s" \
        op stop interval="0" timeout="20s" \
        meta target-role="Started"
group grpZabbixServer p_zabbix p_postfix
property $id="cib-bootstrap-options" \
        dc-version="1.0.13-30bb726" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1466682065"

・リソース登録をする

↓ocfだとリソースエージェントがないといわれてしまうのでとりあえずzabbixはsystemdをつかうことに。

pcs resource create p_zabbix ocf:heartbeat:zabbixserver \
 params binary="/usr/sbin/zabbix_server" pid="/var/run/zabbix/zabbix_server.pid" \
 op monitor interval="10s" timeout="20s" \
 op stop interval="0" timeout="20s" \
 meta target-role="Started"
Error: Unable to create resource 'ocf:heartbeat:zabbixserver', it is not installed on this system (use --force to override)

# pcs resource show
 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Stopped
     WebSite    (ocf::heartbeat:apache):        Stopped

zabbixはsystemdでがんばる

pcs resource create web_server systemd:httpd op monitor interval=10s
pcs resource group add grpZabbixServer web_server
pcs constraint colocation add web_server zabbix_server INFINITY

# pcs resource show
 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi01

・ステータス確認

# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 06:39:36 2016          Last change: Tue Jul 12 06:39:17 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi01 (version 1.1.13-10.el7_2.2-44eb2dd) - partition WITHOUT quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi01 ]
OFFLINE: [ monitor-mng-cent7-test-komi02 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     WebSite    (ocf::heartbeat:apache):        Stopped

Failed Actions:
* WebSite_start_0 on monitor-mng-cent7-test-komi01 'unknown error' (1): call=13, status=Timed Out, exitreason='none',
    last-rc-change='Tue Jul 12 06:39:09 2016', queued=0ms, exec=40002ms


PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

・configを確認

# pcs config show
Cluster Name: zabbixcluster
Corosync Nodes:
 monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 
Pacemaker Nodes:
 monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 

Resources: 
 Group: grpZabbixServer
  Resource: zabbix_server (class=systemd type=zabbix-server)
   Operations: monitor interval=10s (zabbix_server-monitor-interval-10s)
  Resource: WebSite (class=ocf provider=heartbeat type=apache)
   Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status 
   Operations: start interval=0s timeout=40s (WebSite-start-interval-0s)
               stop interval=0s timeout=60s (WebSite-stop-interval-0s)
               monitor interval=1min (WebSite-monitor-interval-1min)

Stonith Devices: 
Fencing Levels: 

Location Constraints:
Ordering Constraints:
Colocation Constraints:
  WebSite with zabbix_server (score:INFINITY) (id:colocation-WebSite-zabbix_server-INFINITY)

Resources Defaults:
 resource-stickiness: INFINITY
 migration-threshold: 1
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: zabbixcluster
 dc-version: 1.1.13-10.el7_2.2-44eb2dd
 default-action-timeout: 240
 default-resource-stickiness: 200
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false
# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(127.0.0.1) ←ここがまずいっぽい
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined

どうもhostsの一行目にクラスタ登録に指定したホスト名を書くのがまずいっぽい

# vi /etc/hosts
127.0.0.1  monitor-mng-cent7-test-komi01  localhost localhost.localdomain localhost4 localhost4.localdomain4
 ↓
127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain4

なおして

# pcs cluster stop --all
# pcs cluster start --all
# pcs status

これを繰り返したところ直ったしOnlineが両方になった模様。

# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.3.103) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.3.104) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

ちなみに起動中に
/etc/corosync/corosync.confのノード名を変えたりするとpcs cluster stopができなくなる模様。

apache側もocfだとステータスにフェイルでていた

Failed Actions:
* WebSite_start_0 on monitor-mng-cent7-test-komi01 'unknown error' (1): call=12, status=Timed Out, exitreason='none',
    last-rc-change='Tue Jul 12 07:27:47 2016', queued=0ms, exec=40003ms

とりあえずこれ(Fail)がでていて切り替わることはありえない。

・cibを確認してみる

# pcs cluster cib
出力結果は長いので省略

# cibadmin --queryでも同じの出る。

・FailCountのクリア方法

# pcs resource cleanup WebSite
Waiting for 4 replies from the CRMd.... OK
Cleaning up zabbix_server on monitor-mng-cent7-test-komi01, removing fail-count-zabbix_server
Cleaning up zabbix_server on monitor-mng-cent7-test-komi02, removing fail-count-zabbix_server
Cleaning up WebSite on monitor-mng-cent7-test-komi01, removing fail-count-WebSite
Cleaning up WebSite on monitor-mng-cent7-test-komi02, removing fail-count-WebSite
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 08:52:30 2016          Last change: Tue Jul 12 08:52:16 2016 by hacluster via crmd on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi02
     WebSite    (ocf::heartbeat:apache):        Started monitor-mng-cent7-test-komi02

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

いい感じにクリアされた。

・ここから切り替えテスト(長い)

以下のようにしてみても全く移動しなかった。

# pcs resource move zabbix_server monitor-mng-cent7-test-komi01

2側をスタンバイにするコマンドを打ってみたところリソースが移った

# pcs cluster standby monitor-mng-cent7-test-komi02
# pcs cluster unstandby --all

とすると01側でwebが落ちてる扱いのとき、02側にリソースが移動して全リソースstartする

# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 09:25:28 2016          Last change: Tue Jul 12 09:23:35 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi02
     WebSite    (ocf::heartbeat:apache):        Started monitor-mng-cent7-test-komi02

Failed Actions:
* WebSite_start_0 on monitor-mng-cent7-test-komi01 'unknown error' (1): call=42, status=Timed Out, exitreason='none',
    last-rc-change='Tue Jul 12 09:21:27 2016', queued=0ms, exec=40002ms


PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

FailedActionでてるのはocfがlocalhostのServer-statusとかみてたため。
確かにserver-statusをcurlでみると404なのでそのせいだった模様。
ステータスまで見てくれなくていいと思い日和ってsystemdに変えました。

curl http://localhost/server-status

リソースを消してsystemdのリソースを登録してみる

# pcs resource delete WebSite
# pcs resource show
 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01

apacheもocfが難しいのでsystemdでがんばる。systemdだとリソースつかんでる間にサービス再起動とかできなくなるのでいい感じに思われる。

pcs resource create web_server systemd:httpd op monitor interval=10s
pcs resource group add grpZabbixServer web_server
pcs constraint colocation add web_server zabbix_server INFINITY

# pcs resource show
 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi01
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:05:04 2016          Last change: Tue Jul 12 10:02:54 2016 by root via cibadmin on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Node monitor-mng-cent7-test-komi02: standby
Online: [ monitor-mng-cent7-test-komi01 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi01

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

・スタンバイ解除

# pcs cluster unstandby --all
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:05:52 2016          Last change: Tue Jul 12 10:05:50 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi01

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

01側:
# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1020/sshd           
tcp        0      0 0.0.0.0:10050           0.0.0.0:*               LISTEN      8723/zabbix_agentd  
tcp        0      0 0.0.0.0:10051           0.0.0.0:*               LISTEN      19346/zabbix_server 
tcp6       0      0 :::80                   :::*                    LISTEN      20467/httpd         
tcp6       0      0 :::2224                 :::*                    LISTEN      4141/ruby           
tcp6       0      0 :::22                   :::*                    LISTEN      1020/sshd           
tcp6       0      0 :::10050                :::*                    LISTEN      8723/zabbix_agentd  
tcp6       0      0 :::10051                :::*                    LISTEN      19346/zabbix_server 

02側:
# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1026/sshd           
tcp6       0      0 :::2224                 :::*                    LISTEN      4268/ruby           
tcp6       0      0 :::22                   :::*                    LISTEN      1026/sshd      

このほうがよさげ。
ocfだとpcsでstopでもnetstatでみると起動してるかんじだった(仮想IPも一緒に移すわけでなくELBつかう前提なので、データ重複しそうであぶない挙動な気がした)。

ちょっと切り替えてみる

# pcs cluster stop monitor-mng-cent7-test-komi01
monitor-mng-cent7-test-komi01: Stopping Cluster (pacemaker)...
monitor-mng-cent7-test-komi01: Stopping Cluster (corosync)...

無事にリソースが移動。

# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:18:01 2016          Last change: Tue Jul 12 10:05:50 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi02 ]
OFFLINE: [ monitor-mng-cent7-test-komi01 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi02
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi02

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

落としたほうを起動してみて、ステータス確認(起動だけでリソースが移動しないことの確認)

# pcs cluster start monitor-mng-cent7-test-komi01
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:19:19 2016          Last change: Tue Jul 12 10:05:50 2016 by root via crm_attribute on monitor-mng-cent7-test-komi01
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi02
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi02

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

とくに問題なさそう。

[root@monitor-mng-cent7-test-komi02 zabbix-server-mysql-3.0.3]# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1026/sshd           
tcp        0      0 0.0.0.0:10051           0.0.0.0:*               LISTEN      21196/zabbix_server 
tcp6       0      0 :::80                   :::*                    LISTEN      21236/httpd         
tcp6       0      0 :::2224                 :::*                    LISTEN      4268/ruby           
tcp6       0      0 :::22                   :::*                    LISTEN      1026/sshd           
tcp6       0      0 :::10051                :::*                    LISTEN      21196/zabbix_server 
[root@monitor-mng-cent7-test-komi01 yum.repos.d]# netstat -lnpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1020/sshd           
tcp        0      0 0.0.0.0:10050           0.0.0.0:*               LISTEN      8723/zabbix_agentd  
tcp6       0      0 :::2224                 :::*                    LISTEN      4141/ruby           
tcp6       0      0 :::22                   :::*                    LISTEN      1020/sshd           
tcp6       0      0 :::10050                :::*                    LISTEN      8723/zabbix_agentd  

では、02をstandbyにしてリソースを移動してみる

# pcs cluster standby monitor-mng-cent7-test-komi02
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:22:36 2016          Last change: Tue Jul 12 10:22:23 2016 by root via crm_attribute on monitor-mng-cent7-test-komi02
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Node monitor-mng-cent7-test-komi02: standby
Online: [ monitor-mng-cent7-test-komi01 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi01

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# pcs cluster unstandby --all
# pcs status
Cluster name: zabbixcluster
Last updated: Tue Jul 12 10:24:34 2016          Last change: Tue Jul 12 10:24:31 2016 by root via crm_attribute on monitor-mng-cent7-test-komi02
Stack: corosync
Current DC: monitor-mng-cent7-test-komi02 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ monitor-mng-cent7-test-komi01 monitor-mng-cent7-test-komi02 ]

Full list of resources:

 Resource Group: grpZabbixServer
     zabbix_server      (systemd:zabbix-server):        Started monitor-mng-cent7-test-komi01
     web_server (systemd:httpd):        Started monitor-mng-cent7-test-komi01

PCSD Status:
  monitor-mng-cent7-test-komi01: Online
  monitor-mng-cent7-test-komi02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ひとまず手動切り替えテスト完了。Failのクリアの手順も特定できた。
(これをchefにしないとなー。。)

・ログまわり

/var/log/cluster/corosync.logにそれらしいのが存在する

特定の時間分取り出す方法

# pcs cluster report --from "2016-07-12 09:00:00" /tmp/dest
# ll /tmp
total 60
-rw-r--r-- 1 root root 54519 Jul 12 10:35 dest.tar.bz2

こんなかんじででてくる。以下のようにtoの指定も可能。

report [--from "YYYY-M-D H:M:S" [--to "YYYY-M-D" H:M:S"]] dest

・フェイル回数を特定

#  pcs resource failcount show zabbix_server
No failcounts for zabbix_server

リアルタイムに様子をみるには
watch pcs statusとかするしかなさそう(helpを眺めてみて)

とりあえず以上、以下参考。

参考:
http://kan3aa.hatenablog.com/entry/2015/06/05/135150
https://blog.apar.jp/zabbix/4177/
http://weblabo.oscasierra.net/installing-mysql57-centos7-yum/
http://doruby.kbmj.com/taka/20131018/yum_update_crm_pcs_
http://infra.blog.shinobi.jp/Entry/107/
http://blog.matsumoto-r.jp/?p=3482
https://ericsysmin.com/2016/02/18/configuring-high-availability-ha-zabbix-server-on-centos-7/
https://gist.github.com/kogoto/92eef35b28ae632a11f7
# pcs cluster --help
http://yomon.hatenablog.com/entry/2016/04/13/110035
http://qiita.com/tukiyo3/items/27dcc7f15a8b4e4cf8a5
# pcs resource --help
https://access.redhat.com/documentation/ja-JP/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ch-clustresources-HAAR.html

6
7
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
7