参考資料
- https://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6
- http://mizzy.org/blog/2013/02/06/1/
構成
- MHA manager-node
- 10.10.0.64
- MHA node
- 10.10.0.63
- 10.10.0.65
- 10.10.0.66
MHA install
共通設定
- mysqlインストール (managerサーバはmysqlクライアントだけでok)
yum install mysql-server mysql -y
service mysqld start
mysql> grant all privileges on *.* to mha@'10.10.0.%' identified by 'mhapassword';
mysql> flush privileges;
- sshログイン設定
# mha manager
ssh-keygen -t rsa -f /root/.ssh/id_rsa -q -N ""
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
# mha node
mkdir /root/.ssh/
vim /root/.ssh/id_rsa
vim /root/.ssh/authorized_keys
chmod 400 /root/.ssh/id_rsa
chmod 600 /root/.ssh/authorized_keys
chmod 700 /root/.ssh/
- replication設定
server1をマスター、server2,3をスレーブとして設定する。
vim /etc/my.cnf
log-bin
server-id=1
service mysqld restart
mysql> grant replication slave on *.* to repl@'10.10.0.%' identified by 'replpassword';
mysql> show master status;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| mysqld-bin.000001 | 106 | | |
+-------------------+----------+--------------+------------------+
mysql> change master to
-> master_host='10.10.0.65',
-> master_user='repl',
-> master_password='replpassword',
-> master_log_file='mysqld-bin.000001',
-> master_log_pos=106;
mysql> start slave;
MHA node
- MHA node install
wget https://72003f4c60f5cc941cd1c7d448fc3c99e0aebaa8.googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-node-0.56-0.el6.noarch.rpm
yum localinstall mha4mysql-node-0.56-0.el6.noarch.rpm
MHA manager
- MHA manger install
wget https://72003f4c60f5cc941cd1c7d448fc3c99e0aebaa8.googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-node-0.56-0.el6.noarch.rpm
wget https://72003f4c60f5cc941cd1c7d448fc3c99e0aebaa8.googledrive.com/host/0B1lu97m8-haWeHdGWXp0YVVUSlk/mha4mysql-manager-0.56-0.el6.noarch.rpm
yum install epel-release
yum localinstall mha4mysql-node-0.56-0.el6.noarch.rpm mha4mysql-manager-0.56-0.el6.noarch.rpm
- 設定ファイル
vim /etc/mha.conf
[server default]
user=root
password=password
manager_workdir=/var/lib/mha
manager_log=/var/log/mha.log
remote_workdir=/var/lib/mha
repl_user=repl
repl_password=repl
ssh_user=root
[server1]
hostname=10.10.0.65
[server2]
hostname=10.10.0.66
[server3]
hostname=10.10.0.63
- SSHの相互接続確認
masterha_check_ssh --conf=/etc/mha.conf
Sat Oct 25 13:47:57 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Oct 25 13:47:57 2014 - [info] Reading application default configuration from /etc/mha.conf..
Sat Oct 25 13:47:57 2014 - [info] Reading server configuration from /etc/mha.conf..
Sat Oct 25 13:47:57 2014 - [info] Starting SSH connection tests..
Sat Oct 25 13:47:58 2014 - [debug]
Sat Oct 25 13:47:57 2014 - [debug] Connecting via SSH from root@10.10.0.65(10.10.0.65:22) to root@10.10.0.66(10.10.0.66:22)..
Sat Oct 25 13:47:58 2014 - [debug] ok.
Sat Oct 25 13:47:58 2014 - [debug] Connecting via SSH from root@10.10.0.65(10.10.0.65:22) to root@10.10.0.63(10.10.0.63:22)..
Sat Oct 25 13:47:58 2014 - [debug] ok.
Sat Oct 25 13:47:59 2014 - [debug]
Sat Oct 25 13:47:57 2014 - [debug] Connecting via SSH from root@10.10.0.66(10.10.0.66:22) to root@10.10.0.65(10.10.0.65:22)..
Sat Oct 25 13:47:58 2014 - [debug] ok.
Sat Oct 25 13:47:58 2014 - [debug] Connecting via SSH from root@10.10.0.66(10.10.0.66:22) to root@10.10.0.63(10.10.0.63:22)..
Sat Oct 25 13:47:59 2014 - [debug] ok.
Sat Oct 25 13:47:59 2014 - [debug]
Sat Oct 25 13:47:58 2014 - [debug] Connecting via SSH from root@10.10.0.63(10.10.0.63:22) to root@10.10.0.65(10.10.0.65:22)..
Sat Oct 25 13:47:59 2014 - [debug] ok.
Sat Oct 25 13:47:59 2014 - [debug] Connecting via SSH from root@10.10.0.63(10.10.0.63:22) to root@10.10.0.66(10.10.0.66:22)..
Sat Oct 25 13:47:59 2014 - [debug] ok.
Sat Oct 25 13:47:59 2014 - [info] All SSH connection tests passed successfully.
- replication設定確認
masterha_check_repl --conf=/etc/mha.conf
Sat Oct 25 14:20:30 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Oct 25 14:20:30 2014 - [info] Reading application default configuration from /etc/mha.conf..
Sat Oct 25 14:20:30 2014 - [info] Reading server configuration from /etc/mha.conf..
Sat Oct 25 14:20:30 2014 - [info] MHA::MasterMonitor version 0.56.
Sat Oct 25 14:20:30 2014 - [info] GTID failover mode = 0
Sat Oct 25 14:20:30 2014 - [info] Dead Servers:
Sat Oct 25 14:20:30 2014 - [info] Alive Servers:
Sat Oct 25 14:20:30 2014 - [info] 10.10.0.65(10.10.0.65:3306)
Sat Oct 25 14:20:30 2014 - [info] 10.10.0.66(10.10.0.66:3306)
Sat Oct 25 14:20:30 2014 - [info] 10.10.0.63(10.10.0.63:3306)
Sat Oct 25 14:20:30 2014 - [info] Alive Slaves:
Sat Oct 25 14:20:30 2014 - [info] 10.10.0.66(10.10.0.66:3306) Version=5.1.73-log (oldest major version between slaves) log-bin:enabled
Sat Oct 25 14:20:30 2014 - [info] Replicating from 10.10.0.65(10.10.0.65:3306)
Sat Oct 25 14:20:30 2014 - [info] 10.10.0.63(10.10.0.63:3306) Version=5.1.73-log (oldest major version between slaves) log-bin:enabled
Sat Oct 25 14:20:30 2014 - [info] Replicating from 10.10.0.65(10.10.0.65:3306)
Sat Oct 25 14:20:30 2014 - [info] Current Alive Master: 10.10.0.65(10.10.0.65:3306)
Sat Oct 25 14:20:30 2014 - [info] Checking slave configurations..
Sat Oct 25 14:20:30 2014 - [info] read_only=1 is not set on slave 10.10.0.66(10.10.0.66:3306).
Sat Oct 25 14:20:30 2014 - [warning] relay_log_purge=0 is not set on slave 10.10.0.66(10.10.0.66:3306).
Sat Oct 25 14:20:30 2014 - [info] read_only=1 is not set on slave 10.10.0.63(10.10.0.63:3306).
Sat Oct 25 14:20:30 2014 - [warning] relay_log_purge=0 is not set on slave 10.10.0.63(10.10.0.63:3306).
Sat Oct 25 14:20:30 2014 - [info] Checking replication filtering settings..
Sat Oct 25 14:20:30 2014 - [info] binlog_do_db= , binlog_ignore_db=
Sat Oct 25 14:20:30 2014 - [info] Replication filtering check ok.
Sat Oct 25 14:20:30 2014 - [info] GTID (with auto-pos) is not supported
Sat Oct 25 14:20:30 2014 - [info] Starting SSH connection tests..
Sat Oct 25 14:20:33 2014 - [info] All SSH connection tests passed successfully.
Sat Oct 25 14:20:33 2014 - [info] Checking MHA Node version..
Sat Oct 25 14:20:34 2014 - [info] Version check ok.
Sat Oct 25 14:20:34 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 25 14:20:34 2014 - [info] HealthCheck: SSH to 10.10.0.65 is reachable.
Sat Oct 25 14:20:34 2014 - [info] Master MHA Node version is 0.56.
Sat Oct 25 14:20:34 2014 - [info] Checking recovery script configurations on 10.10.0.65(10.10.0.65:3306)..
Sat Oct 25 14:20:34 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/lib/mha/save_binary_logs_test --manager_version=0.56 --start_file=mysqld-bin.000001
Sat Oct 25 14:20:34 2014 - [info] Connecting to root@10.10.0.65(10.10.0.65:22)..
Creating /var/lib/mha if not exists.. Creating directory /var/lib/mha.. done.
ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysqld-bin.000001
Sat Oct 25 14:20:35 2014 - [info] Binlog setting check done.
Sat Oct 25 14:20:35 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Oct 25 14:20:35 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.10.0.66 --slave_ip=10.10.0.66 --slave_port=3306 --workdir=/var/lib/mha --target_version=5.1.73-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Oct 25 14:20:35 2014 - [info] Connecting to root@10.10.0.66(10.10.0.66:22)..
Creating directory /var/lib/mha.. done.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Oct 25 14:20:35 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.10.0.63 --slave_ip=10.10.0.63 --slave_port=3306 --workdir=/var/lib/mha --target_version=5.1.73-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Oct 25 14:20:35 2014 - [info] Connecting to root@10.10.0.63(10.10.0.63:22)..
Creating directory /var/lib/mha.. done.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Oct 25 14:20:36 2014 - [info] Slaves settings check done.
Sat Oct 25 14:20:36 2014 - [info]
10.10.0.65(10.10.0.65:3306) (current master)
+--10.10.0.66(10.10.0.66:3306)
+--10.10.0.63(10.10.0.63:3306)
Sat Oct 25 14:20:36 2014 - [info] Checking replication health on 10.10.0.66..
Sat Oct 25 14:20:36 2014 - [info] ok.
Sat Oct 25 14:20:36 2014 - [info] Checking replication health on 10.10.0.63..
Sat Oct 25 14:20:36 2014 - [info] ok.
Sat Oct 25 14:20:36 2014 - [warning] master_ip_failover_script is not defined.
Sat Oct 25 14:20:36 2014 - [warning] shutdown_script is not defined.
Sat Oct 25 14:20:36 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
MHA start
- 起動
masterha_manager --conf=/etc/mha.conf
Sat Oct 25 14:25:10 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Oct 25 14:25:10 2014 - [info] Reading application default configuration from /etc/mha.conf..
Sat Oct 25 14:25:10 2014 - [info] Reading server configuration from /etc/mha.conf..
- マスター落としてみる
[root@MHAnode1 ~]# mysql -u root -ppassword -e "show processlist;"
+----+------+-----------------------------------------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------------------------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| 4 | repl | ip-10-10-0-66.ap-northeast-1.compute.internal:35765 | NULL | Binlog Dump | 702 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 5 | repl | ip-10-10-0-63.ap-northeast-1.compute.internal:59459 | NULL | Binlog Dump | 691 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 12 | mha | ip-10-10-0-64.ap-northeast-1.compute.internal:58189 | NULL | Sleep | 2 | | NULL |
| 21 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+------+-----------------------------------------------------+------+-------------+------+----------------------------------------------------------------+------------------+
[root@MHAnode2 ~]# mysql -u root -ppassword -e "show processlist;"
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| 3 | system user | | NULL | Connect | 712 | Waiting for master to send event | NULL |
| 4 | system user | | NULL | Connect | 712 | Has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 16 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
[root@MHAnode3 ~]# mysql -u root -ppassword -e "show processlist;"
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| 3 | system user | | NULL | Connect | 707 | Waiting for master to send event | NULL |
| 4 | system user | | NULL | Connect | 707 | Has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 15 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
pkill mysql
masterha_managerが下記ログ吐いて停止した。
/var/lib/mha/mha.failover.complete
ファイルが作成されてた。これがあると次回フェイルオーバーが行われない。
マスターが切り替わって、スレーブも作成されてた!
Creating /var/lib/mha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysqld-bin.000001
Sat Oct 25 14:30:41 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Oct 25 14:30:41 2014 - [info] Reading application default configuration from /etc/mha.conf..
Sat Oct 25 14:30:41 2014 - [info] Reading server configuration from /etc/mha.conf..
[root@MHAnode2 ~]# mysql -u root -ppassword -e "show processlist;"
+----+------+-----------------------------------------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------------------------------------+------+-------------+------+----------------------------------------------------------------+------------------+
| 23 | repl | ip-10-10-0-63.ap-northeast-1.compute.internal:36111 | NULL | Binlog Dump | 512 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 32 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+------+-----------------------------------------------------+------+-------------+------+----------------------------------------------------------------+------------------+
[root@MHAnode3 ~]# mysql -u root -ppassword -e "show processlist;"
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
| 21 | system user | | NULL | Connect | 638 | Waiting for master to send event | NULL |
| 22 | system user | | NULL | Connect | 638 | Has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 34 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------+------------------+
-
mha.conf
に書かれているserver1が起動しておらず、そのままだとmasterha_manager
が起動できないので、server1を手動で組み込んで起動する必要がある。 - 落ちたマスターを、切り替わったあと自動でスレーブとして組み込む機能はない。手動で組み込む必要がある。
- フェイルオーバー発生後の処理は手動で行う必要がある。
- consul(DNSベース)で行いたい。
追記(2015/02/16)
フェイルオーバー後、/var/lib/mha/mha.failover.complete
ファイルが作成されるが、一定時間(デフォルト:8時間)経過後であれば正常に再フェイルオーバー可能だった。
この時間は --last_failover_minute=(minutes)
オプションで変更可能。
また、 --ignore_last_failover
オプションをつけることで無効化できる。