前文
歳のせいか、業界の動向に疎くなっているようで。。。いつの間にか、SQL Server on Linux 2017でした(涙)
- https://www.microsoft.com/ja-jp/sql-server/sql-server-2017-linux
正式にMSのPacemaker用のRA(fci)も用意されているようです。
- https://github.com/Microsoft/mssql-server-ha
今のバージョンがどの程度、安定性があるのか不明ですが、LinuxのDatabaseの選択子の一つになっていくかも知れません。
環境
- RHEL7.4の単ノード
- Pacemakerは2.0-rc3
- SQL Server 2017
[root@test~]# rpm -qa | grep sql
msodbcsql17-17.1.0.1-1.x86_64
mssql-server-ha-14.0.3025.34-3.x86_64
mssql-tools-17.1.0.1-1.x86_64
mssql-server-14.0.3025.34-3.x86_64
設定
「SQL Server on Linux 2017」のインストールや初期設定については、関連リンク先に詳しく書かれているので割愛します。
systemd管理と、fci管理の2つのパターンで、STONITH無で設定しています。単ノードで管理するRAは1つのみにしています。
実際には、クラスタ構成するのであれば、アクセス用のVIP(IPaddr2)や、DB領域用の共有(もしくは、DRBDなどの同期可能なディスク)が必要ですが、省いています。
動作確認
systemd管理
- Pacemakerを起動してcrmファイルを流し込みます。
property no-quorum-policy="ignore" \
stonith-enabled="false" \
startup-fencing="false" \
rsc_defaults resource-stickiness="INFINITY" \
migration-threshold="1"
#
primitive prmMssqlServer systemd:mssql-server \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="block"
- crm_monの様子
[root@rh74-test ~]# crm_mon -1 -Af
Stack: corosync
Current DC: rh74-test (version 2.0.0-2c83c6872a) - partition WITHOUT quorum
Last updated: Tue May 15 15:26:25 2018
Last change: Tue May 15 15:26:19 2018 by root via cibadmin on rh74-test
1 node configured
1 resource configured
Online: [ rh74-test ]
Active resources:
prmMssqlServer (systemd:mssql-server): Started rh74-test
Node Attributes:
* Node rh74-test:
Migration Summary:
* Node rh74-test:
- psとstatusで確認してみます
[root@rh74-test ~]# ps -ef | grep mssql
mssql 5853 1 0 15:26 ? 00:00:01 /opt/mssql/bin/sqlservr
mssql 5855 5853 9 15:26 ? 00:00:10 /opt/mssql/bin/sqlservr
root 6083 25955 0 15:28 pts/0 00:00:00 grep --color=auto mssql
[root@rh74-test ~]# systemctl status mssql-server.service
● mssql-server.service - Cluster Controlled mssql-server
Loaded: loaded (/usr/lib/systemd/system/mssql-server.service; enabled; vendor preset: disabled)
Drop-In: /run/systemd/system/mssql-server.service.d
└─50-pacemaker.conf
Active: active (running) since 火 2018-05-15 15:26:19 JST; 2min 9s ago
Docs: https://docs.microsoft.com/en-us/sql/linux
Main PID: 5853 (sqlservr)
Memory: 693.7M
CGroup: /system.slice/mssql-server.service
├─5853 /opt/mssql/bin/sqlservr
└─5855 /opt/mssql/bin/sqlservr
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.41 Server 2018-05-15 15:26:28.42 Server Server is listening on [ ::1 <ipv6> 1434].
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.42 Server 2018-05-15 15:26:28.42 Server Server is listening on [ 127.0.0.1 <ipv4> 1434].
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.42 Server 2018-05-15 15:26:28.42 Server Dedicated admin connection support was established for listening locally on port 1434.
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.44 spid19s SQL Server 2018-05-15 15:26:28.44 spid19s SQL Server is now ready for client connections. This is an informat...n is required.
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.98 spid9s 2018-05-15 15:26:28.98 spid9s Starting up database 'tempdb'.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.26 spid9s tempdb 2018-05-15 15:26:29.26 spid9s The tempdb database has 1 data file(s).
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.27 spid24s Service Broker 2018-05-15 15:26:29.27 spid24s The Service Broker endpoint is in disabled or stopped state.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.27 spid24s Database Mirroring 2018-05-15 15:26:29.27 spid24s The Database Mirroring endpoint is in disabled or stopped state.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.29 spid24s Service Broker 2018-05-15 15:26:29.29 spid24s Service Broker manager has started.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.45 spid5s 2018-05-15 15:26:29.45 spid5s Recovery is complete. This is an informational message only. No user action is required.
Hint: Some lines were ellipsized, use -l to show in full.
- mssqlプロセスに疑似故障を起こしてみます。故障検知し単ノードの為、停止します。
[root@rh74-test opt]# ps -ef |grep mssql
mssql 5853 1 0 15:26 ? 00:00:01 /opt/mssql/bin/sqlservr
mssql 5855 5853 6 15:26 ? 00:00:12 /opt/mssql/bin/sqlservr
root 6106 26259 0 15:29 pts/1 00:00:00 grep --color=auto mssql
[root@rh74-test opt]# kill -9 5853
[root@rh74-test opt]# ps -ef |grep mssql
root 6109 26259 0 15:29 pts/1 00:00:00 grep --color=auto mssql
[root@rh74-test ~]# crm_mon -1 -Af
Stack: corosync
Current DC: rh74-test (version 2.0.0-2c83c6872a) - partition WITHOUT quorum
Last updated: Tue May 15 15:30:03 2018
Last change: Tue May 15 15:26:19 2018 by root via cibadmin on rh74-test
1 node configured
1 resource configured
Online: [ rh74-test ]
No active resources
Node Attributes:
* Node rh74-test:
Migration Summary:
* Node rh74-test:
prmMssqlServer: migration-threshold=1 fail-count=1 last-failure='Tue May 15 15:29:52 2018'
Failed Actions:
* prmMssqlServer_monitor_10000 on rh74-test 'not running' (7): call=7, status=complete, exitreason='',
last-rc-change='Tue May 15 15:29:52 2018', queued=0ms, exec=0ms
[root@rh74-test ~]# systemctl status mssql-server.service
● mssql-server.service - Microsoft SQL Server Database Engine
Loaded: loaded (/usr/lib/systemd/system/mssql-server.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since 火 2018-05-15 15:29:44 JST; 1min 1s ago
Docs: https://docs.microsoft.com/en-us/sql/linux
Main PID: 5853 (code=killed, signal=KILL)
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.44 spid19s SQL Server 2018-05-15 15:26:28.44 spid19s SQL Server is now ready for client connections. This is an informat...n is required.
5月 15 15:26:28 rh74-test sqlservr[5853]: 2018-05-15 15:26:28.98 spid9s 2018-05-15 15:26:28.98 spid9s Starting up database 'tempdb'.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.26 spid9s tempdb 2018-05-15 15:26:29.26 spid9s The tempdb database has 1 data file(s).
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.27 spid24s Service Broker 2018-05-15 15:26:29.27 spid24s The Service Broker endpoint is in disabled or stopped state.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.27 spid24s Database Mirroring 2018-05-15 15:26:29.27 spid24s The Database Mirroring endpoint is in disabled or stopped state.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.29 spid24s Service Broker 2018-05-15 15:26:29.29 spid24s Service Broker manager has started.
5月 15 15:26:29 rh74-test sqlservr[5853]: 2018-05-15 15:26:29.45 spid5s 2018-05-15 15:26:29.45 spid5s Recovery is complete. This is an informational message only. No user action is required.
5月 15 15:29:44 rh74-test systemd[1]: mssql-server.service: main process exited, code=killed, status=9/KILL
5月 15 15:29:44 rh74-test systemd[1]: Unit mssql-server.service entered failed state.
5月 15 15:29:44 rh74-test systemd[1]: mssql-server.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
fci管理
- Pacemakerを起動してcrmファイルを流し込みます。
property no-quorum-policy="ignore" \
stonith-enabled="false" \
startup-fencing="false" \
rsc_defaults resource-stickiness="INFINITY" \
migration-threshold="1"
#
primitive rh74-01 ocf:mssql:fci \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="block"
- crm_monの様子
[root@rh74-test ~]# crm_mon -1 -Af
Stack: corosync
Current DC: rh74-test (version 2.0.0-2c83c6872a) - partition WITHOUT quorum
Last updated: Tue May 15 15:39:50 2018
Last change: Tue May 15 15:39:31 2018 by root via cibadmin on rh74-test
1 node configured
1 resource configured
Online: [ rh74-test ]
Active resources:
rh74-01 (ocf::mssql:fci): Started rh74-test
Node Attributes:
* Node rh74-test:
Migration Summary:
* Node rh74-test:
- psとstatusで確認してみます。systemd管理ではないので、systemdのstatusでは確認されません。
[root@rh74-test ~]# ps -ef |grep mssql
mssql 9475 1 3 15:44 ? 00:00:00 /opt/mssql/bin/sqlservr
mssql 9486 9475 73 15:44 ? 00:00:09 /opt/mssql/bin/sqlservr
root 9730 9353 0 15:44 ? 00:00:00 /bin/bash /usr/lib/ocf/resource.d/mssql/fci monitor
root 9757 9730 0 15:44 ? 00:00:00 /bin/bash /usr/lib/ocf/resource.d/mssql/fci monitor
root 9758 9757 4 15:44 ? 00:00:00 /usr/lib/ocf/lib/mssql/fci-helper --port 1433 --credentials-file /var/opt/mssql/secrets/passwd --application-name monitor-rh74-01 --connection-timeout 20 --health-threshold 3 --action monitor --virtual-server-name rh74-01
root 9759 9757 0 15:44 ? 00:00:00 /bin/bash /usr/lib/ocf/resource.d/mssql/fci monitor
root 9785 25955 0 15:44 pts/0 00:00:00 grep --color=auto mssql
[root@rh74-test ~]# systemctl status mssql-server.service
● mssql-server.service - Microsoft SQL Server Database Engine
Loaded: loaded (/usr/lib/systemd/system/mssql-server.service; enabled; vendor preset: disabled)
Active: inactive (dead) since 火 2018-05-15 15:43:10 JST; 1min 20s ago
Docs: https://docs.microsoft.com/en-us/sql/linux
Process: 9110 ExecStart=/opt/mssql/bin/sqlservr (code=exited, status=0/SUCCESS)
Main PID: 9110 (code=exited, status=0/SUCCESS)
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.21 spid24s Service Broker 2018-05-15 15:43:00.21 spid24s The Service Broker endpoint is in disabled or stopped state.
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.21 spid24s Database Mirroring 2018-05-15 15:43:00.22 spid24s The Database Mirroring endpoint is in disabled or stopped state.
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.23 spid24s Service Broker 2018-05-15 15:43:00.23 spid24s Service Broker manager has started.
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.28 spid6s 2018-05-15 15:43:00.28 spid6s Recovery is complete. This is an informational message only. No user action is required.
5月 15 15:43:00 rh74-test systemd[1]: Stopping Microsoft SQL Server Database Engine...
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.47 spid6s Always On: SQL Server is shutting down 2018-05-15 15:43:00.48 spid6s Always On: The availability replica man...n is required.
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.48 spid6s 2018-05-15 15:43:00.48 spid6s SQL Server is terminating in response to a 'stop' request from Service Control...n is required.
5月 15 15:43:00 rh74-test sqlservr[9110]: 2018-05-15 15:43:00.59 spid24s Service Broker 2018-05-15 15:43:00.59 spid24s Service Broker manager has shut down.
5月 15 15:43:10 rh74-test sqlservr[9110]: 2018-05-15 15:43:10.30 spid6s 2018-05-15 15:43:10.30 spid6s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an infor...n is required.
5月 15 15:43:10 rh74-test systemd[1]: Stopped Microsoft SQL Server Database Engine.
Hint: Some lines were ellipsized, use -l to show in full.
- mssqlプロセスに疑似故障を起こしてみます。故障検知し単ノードの為、停止します。
[root@rh74-test ~]# ps -ef |grep mssql
mssql 9475 1 1 15:44 ? 00:00:00 /opt/mssql/bin/sqlservr
mssql 9486 9475 25 15:44 ? 00:00:10 /opt/mssql/bin/sqlservr
root 9876 25955 0 15:44 pts/0 00:00:00 grep --color=auto mssql
[root@rh74-test ~]# kill -9 9475
[root@rh74-test ~]# ps -ef |grep mssql
root 9995 25955 0 15:44 pts/0 00:00:00 grep --color=auto mssql
[root@rh74-test ~]# crm_mon -1 -Af
Stack: corosync
Current DC: rh74-test (version 2.0.0-2c83c6872a) - partition WITHOUT quorum
Last updated: Tue May 15 15:45:05 2018
Last change: Tue May 15 15:44:01 2018 by root via cibadmin on rh74-test
1 node configured
1 resource configured
Online: [ rh74-test ]
No active resources
Node Attributes:
* Node rh74-test:
Migration Summary:
* Node rh74-test:
rh74-01: migration-threshold=1 fail-count=1 last-failure='Tue May 15 15:44:48 2018'
Failed Actions:
* rh74-01_monitor_10000 on rh74-test 'unknown error' (1): call=7, status=complete, exitreason='2018/05/15 15:44:48 Unexpected error: Unable to open tcp connection with host 'localhost:1433': dial tcp 127.0.0.1:1433: getsock',
last-rc-change='Tue May 15 15:44:48 2018', queued=0ms, exec=0ms
気になる点
- systemd管理時に疑似故障時にcrm_monのexitreasonが何も入っていないので、何かエラー文字列を埋めた方が良いと思われます。(もしかしたら、systemdリソースの場合は、exitreasonは設定出来ないのかも??)
- fci管理の場合、fciリソースの名称を変更する必要があるようです。(詳細をみていませんが、設定などで回避する方法があるのかも知れません。。。)
[root@rh74-test ~]# crm_mon -1 -Af
Stack: corosync
Current DC: rh74-test (version 2.0.0-2c83c6872a) - partition WITHOUT quorum
Last updated: Tue May 15 15:35:54 2018
Last change: Tue May 15 15:35:39 2018 by root via cibadmin on rh74-test
1 node configured
1 resource configured
Online: [ rh74-test ]
Active resources:
prmMssqlServer (ocf::mssql:fci): FAILED rh74-test
Node Attributes:
* Node rh74-test:
Migration Summary:
* Node rh74-test:
prmMssqlServer: migration-threshold=1 fail-count=1000000 last-failure='Tue May 15 15:35:53 2018'
Failed Actions:
* prmMssqlServer_start_0 on rh74-test 'invalid parameter' (2): call=6, status=complete, exitreason='2018/05/15 15:35:53 Expected local server name to be prmMssqlServer but it was rh74-test',
last-rc-change='Tue May 15 15:35:40 2018', queued=0ms, exec=12836ms
(fci-helperの該当エラー箇所)
(snip)
// Function: monitor
//
// Description:
// Implements the OCF "monitor" action
//
func monitor(db *sql.DB, virtualServerName string, stdout *log.Logger) (mssqlcommon.OcfExitCode, error) {
stdout.Println("Querying local server name...")
currentServerName, err := mssqlcommon.GetLocalServerName(db)
if err != nil {
return mssqlcommon.OCF_ERR_GENERIC, fmt.Errorf("Could not query local server name: %s", err)
}
stdout.Printf("Local server name is %s\n", currentServerName)
if !strings.EqualFold(currentServerName, virtualServerName) {
return mssqlcommon.OCF_ERR_ARGS, fmt.Errorf("Expected local server name to be %s but it was %s", virtualServerName, currentServerName)
}
return mssqlcommon.OCF_SUCCESS, nil
}
(snip)
fciのRA内の$OCF_RESOURCE_INSTANCEを渡している模様。。。
- fci管理で起動するとmonitorの都度、ログが大量に出て、かなり鬱陶しい感じです。実際にPacemakerでクラスタする場合には、改善(ログをdebugなどに落とす)が必要でしょう。
(snip)
May 15 15:06:44 rh74-test fci(rh74-test)[4149]: INFO: mssql_validate
May 15 15:06:44 rh74-test fci(rh74-test)[4149]: INFO: Resource agent invoked with: monitor
May 15 15:06:44 rh74-test fci(rh74-test)[4149]: INFO: mssql_monitor
May 15 15:06:44 rh74-test fci(rh74-test)[4149]: INFO: monitor: 2018/05/15 15:06:44 fci-helper invoked with hostname [localhost]; port [1433]; credentials-file [/var/opt/mssql/secrets/passwd]; application-name [monitor-rh74-test]; connection-timeout [20]; health-threshold [3]; action [monitor]
May 15 15:06:44 rh74-test fci(rh74-test)[4149]: INFO: monitor: 2018/05/15 15:06:44 fci-helper invoked with virtual-server-name [rh74-test]
May 15 15:06:44 rh74-test fci(rh74-test)[4149]: INFO: monitor: 2018/05/15 15:06:44 Attempt 1 to connect to the instance at localhost:1433 and run sp_server_diagnostics
May 15 15:06:45 rh74-test fci(rh74-test)[4149]: INFO: monitor: 2018/05/15 15:06:45 Connected to the instance at localhost:1433
May 15 15:06:50 rh74-test fci(rh74-test)[4149]: INFO: monitor: 2018/05/15 15:06:50 Querying local server name...
May 15 15:06:50 rh74-test fci(rh74-test)[4149]: INFO: monitor: 2018/05/15 15:06:50 Local server name is rh74-test
May 15 15:06:50 rh74-test fci(rh74-test)[4149]: INFO: rh74-test monitor : 0
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: mssql_validate
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: Resource agent invoked with: monitor
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: mssql_monitor
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: monitor: 2018/05/15 15:07:00 fci-helper invoked with hostname [localhost]; port [1433]; credentials-file [/var/opt/mssql/secrets/passwd]; application-name [monitor-rh74-test]; connection-timeout [20]; health-threshold [3]; action [monitor]
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: monitor: 2018/05/15 15:07:00 fci-helper invoked with virtual-server-name [rh74-test]
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: monitor: 2018/05/15 15:07:00 Attempt 1 to connect to the instance at localhost:1433 and run sp_server_diagnostics
May 15 15:07:00 rh74-test fci(rh74-test)[4250]: INFO: monitor: 2018/05/15 15:07:00 Connected to the instance at localhost:1433
May 15 15:07:05 rh74-test fci(rh74-test)[4250]: INFO: monitor: 2018/05/15 15:07:05 Querying local server name...
May 15 15:07:05 rh74-test fci(rh74-test)[4250]: INFO: monitor: 2018/05/15 15:07:05 Local server name is rh74-test
May 15 15:07:05 rh74-test fci(rh74-test)[4250]: INFO: rh74-test monitor : 0
(snip)
関連リンク
以下に既にPacemakerで設定されている記載がありました。参考にさせて頂きました。
- https://qiita.com/harukano/items/6488c15e0845402fafc3
- https://www.server-world.info/query?os=CentOS_7&p=mssql2017&f=1