More than 5 years have passed since last update.

Zabbix監視テンプレート（Scality 6 Supervisor）

Last updated at 2018-04-24Posted at 2018-03-19

Zabbix監視テンプレート（Scality 6 Supervisor）を作成しました。
Zabbix 3.0、Scality RING 6.4.5.4（Mithrandir）で検証しています。

前提条件

/etc/logrotate.dの設定がrestartではなく、reloadを使用している事。
Zabbixエージェントの設定ファイルにパラメータ"Include=/etc/zabbix/zabbix_agentd.d/"を設定している事。
Zabbixエージェントの設定ファイルにパラメータ"ServerActive=Zabbix ServerのIPアドレス"を設定している事。

Scality 6 Supervisor

テンプレート

Template App Scality 6 Supervisor Service（テンプレートとのリンク: なし）

アプリケーション

Scality 6 Supervisor service

アイテム

#	アイテム名	トリガー	キー	データ型	単位	乗数の使用	更新間隔（秒）	ヒストリ	トレンド	タイプ	アプリケーション	内容	備考
1	RING Administrator HTTP service is running	1	net.tcp.listen[3080]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP3080ポート（http）のステータス取得	TCPポート（http）を使用しない場合は不要
2	RING Administrator HTTPS service is running	1	net.tcp.listen[3443]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP3443ポート（https）のステータス取得	TCPポート（https）を使用しない場合は不要
3	Supervisor sagentd service is running	1	net.tcp.listen[7084]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP7084ポート（sagentd）のステータス取得	TCPポート（sagentd）を使用しない場合は不要
4	Supervisor supv2 service is running	1	net.tcp.listen[12345]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP12345ポート（supv2）のステータス取得	TCPポート（supv2）を使用しない場合は不要
5	Supervisor uwsgi service is running	1	net.tcp.listen[4443]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP4443ポート（uwsgi）のステータス取得	TCPポート（uwsgi）を使用しない場合は不要
6	Supervisor grafana-server service is running	1	net.tcp.listen[3000]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP3000ポート（grafana-server）のステータス取得	TCPポート（grafana-server）を使用しない場合は不要
7	Supervisor bizstoresup service is running	1	net.tcp.listen[2443]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP2443ポート（bizstoresup）のステータス取得	TCPポート（bizstoresup）を使用しない場合は不要
8	Supervisor bizstoresup service is running	1	net.tcp.listen[5580]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP5580ポート（bizstoresup）のステータス取得	TCPポート（bizstoresup）を使用しない場合は不要
9	Supervisor salt-master service is running	1	net.tcp.listen[4505]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP4505ポート（salt-master）のステータス取得	TCPポート（salt-master）を使用しない場合は不要
10	Supervisor salt-master service is running	1	net.tcp.listen[4506]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	TCP4506ポート（salt-master）のステータス取得	TCPポート（salt-master）を使用しない場合は不要
11	Number of SSD failures	1	custom.scality.supervisor.ringsh.ringstatus.meta.ssd.ng	数値	-	-	60	7	365	Zabbixエージェント	Scality 6 Supervisor service	認識しているエラー状態のSSDの個数取得
12	Number of HDD failures	1	custom.scality.supervisor.ringsh.ringstatus.data.disk.ng	数値	-	-	60	7	365	Zabbixエージェント	Scality 6 Supervisor service	認識しているエラー状態のHDDの個数取得
13	Number of running httpd processes	1	proc.num[httpd,root,,"-DFOREGROUND"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	httpdプロセスの個数取得
14	Number of running bizstoresup processes	1	proc.num[,root,,"/usr/bin/bizstoresup"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	bizstoresupプロセスの個数取得
15	Number of running sagentd processes	1	proc.num[,root,,"/usr/bin/sagentd"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentdプロセスの個数取得
16	Number of running sagentd/core processes	1	proc.num[,root,,"sagentd/core"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentd/coreプロセスの個数取得
17	Number of running sagentd/heartbeat processes	1	proc.num[,root,,"sagentd/heartbeat"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentd/heartbeatプロセスの個数取得
18	Number of running sagentd/poll processes	1	proc.num[,root,,"sagentd/poll"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentd/pollプロセスの個数取得
19	Number of running sagentd/scheduler processes	1	proc.num[,root,,"sagentd/scheduler"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentd/schedulerプロセスの個数取得
20	Number of running sagentd/webclient processes	1	proc.num[,root,,"sagentd/webclient"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentd/webclientプロセスの個数取得
21	Number of running sagentd/webserver processes	1	proc.num[,root,,"sagentd/webserver"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	sagentd/webserverプロセスの個数取得
22	Number of running supv2 processes	1	proc.num[,root,,"/usr/bin/supv2"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	supv2プロセスの個数取得
23	Number of running supv2/core processes	1	proc.num[,root,,"supv2/core"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	supv2/coreプロセスの個数取得
24	Number of running supv2/scheduler processes	1	proc.num[,root,,"supv2/scheduler"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	supv2/schedulerプロセスの個数取得
25	Number of running supv2/webclient processes	1	proc.num[,root,,"supv2/webclient"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	supv2/webclientプロセスの個数取得
26	Number of running supv2/webserver processes	1	proc.num[,root,,"supv2/webserver"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	supv2/webserverプロセスの個数取得
27	Number of running salt-master processes	1	proc.num[,root,,"salt-master"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	salt-masterプロセスの個数取得
28	Number of running salt-minion processes	1	proc.num[,root,,"salt-minion"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	salt-minionプロセスの個数取得
29	Number of running uwsgi processes	1	proc.num[,,,"/usr/sbin/uwsgi"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	uwsgiプロセスの個数取得
30	Number of running grafana-server processes	1	proc.num[,grafana,,"/usr/sbin/grafana-server"]	数値	-	-	30	90	365	Zabbixエージェント	Scality 6 Supervisor service	grafana-serverプロセスの個数取得
31	Number of online Nodes	0	custom.scality.supervisor.ringsh.serverlist.online	数値	-	-	60	7	365	Zabbixエージェント	Scality 6 Supervisor service	認識しているオンライン状態のノードの個数取得
32	Number of offline Nodes	1	custom.scality.supervisor.ringsh.serverlist.offline	数値	-	-	60	7	365	Zabbixエージェント	Scality 6 Supervisor service	認識しているオフライン状態のノードの個数取得
33	Free disk space on DATA	0	custom.scality.supervisor.ringsh.ringstorage.data.disk.avail	数値	B	1099511627776	60	7	365	Zabbixエージェント	Scality 6 Supervisor service	ノード全体のDATAの合計空き容量
34	Used disk space on DATA	0	custom.scality.supervisor.ringsh.ringstorage.data.disk.used	数値	B	1099511627776	60	7	365	Zabbixエージェント	Scality 6 Supervisor service	ノード全体のDATAの合計使用容量
35	Supervisor detected error log in cluster.log	1	log[/var/log/scality-supervisor/cluster.log,,,,skip]	ログ	-	-	30	7	365	Zabbixエージェント	Scality 6 Supervisor service	ログファイルの文字列取得

ヒストリとは各収集値の保持期間
トレンドとは数値データタイプの1時間あたりの最低値、最高値、平均値および合計値の保持期間
Zabbixにリトライ回数、リトライ間隔、タイムアウト時間は存在しない

トリガー

#	深刻度	トリガー	条件式	種別	内容	備考
1	軽度の障害	RING Administrator HTTP service is down on {HOST.NAME}	net.tcp.listen[3080].max(#3)=0	ポート	直近3回のTCP3080ポート（RING Administrator HTTP）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP3080ポート（RING Administrator HTTP）を使用しない場合は不要
2	軽度の障害	RING Administrator HTTPS service is down on {HOST.NAME}	net.tcp.listen[3443].max(#3)=0	ポート	直近3回のTCP3443ポート（RING Administrator HTTPS）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP3443ポート（RING Administrator HTTPS）を使用しない場合は不要
3	軽度の障害	Supervisor bizstoresup service is down on {HOST.NAME}	net.tcp.listen[2443].max(#3)=0	ポート	直近3回のTCP2443ポート（bizstoresup）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP2443ポート（bizstoresup）を使用しない場合は不要
4	軽度の障害	Supervisor bizstoresup service is down on {HOST.NAME}	net.tcp.listen[5580].max(#3)=0	ポート	直近3回のTCP5580ポート（bizstoresup）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP5580ポート（bizstoresup）を使用しない場合は不要
5	軽度の障害	Supervisor supv2 service is down on {HOST.NAME}	net.tcp.listen[12345].max(#3)=0	ポート	直近3回のTCP12345ポート（supv2）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP12345ポート（supv2）を使用しない場合は不要
6	軽度の障害	Supervisor sagentd service is down on {HOST.NAME}	net.tcp.listen[7084].max(#3)=0	ポート	直近3回のTCP7084ポート（sagentd）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP7084ポート（sagentd）を使用しない場合は不要
7	軽度の障害	Supervisor uwsgi service is down on {HOST.NAME}	net.tcp.listen[4443].max(#3)=0	ポート	直近3回のTCP4443ポート（uwsgi）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP4443ポート（uwsgi）を使用しない場合は不要
8	軽度の障害	Supervisor grafana-server service is down on {HOST.NAME}	net.tcp.listen[3000].max(#3)=0	ポート	直近3回のTCP3000ポート（grafana-server）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP3000ポート（grafana-server）を使用しない場合は不要
9	軽度の障害	Supervisor salt-master service is down on {HOST.NAME}	net.tcp.listen[4505].max(#3)=0	ポート	直近3回のTCP4505ポート（salt-master）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP4505ポート（salt-master）を使用しない場合は不要
10	軽度の障害	Supervisor salt-master service is down on {HOST.NAME}	net.tcp.listen[4506].max(#3)=0	ポート	直近3回のTCP4506ポート（salt-master）のステータス取得時の戻り値（最大値）が0（Close）だった場合	TCP4506ポート（salt-master）を使用しない場合は不要
11	重度の障害	Number of detected a SSD Failures	custom.scality.supervisor.ringsh.ringstatus.meta.ssd.ng.max(#3)<>0	ディスク	直近3回の認識しているエラー状態のSSDの最大個数が0以外だった場合
12	重度の障害	Number of detected a HDD Failures	custom.scality.supervisor.ringsh.ringstatus.data.disk.ng.max(#3)<>0	ディスク	直近3回の認識しているエラー状態のHDDの最大個数が0以外だった場合
13	重度の障害	httpd process is not running on {HOST.NAME}	proc.num[httpd,root,,].last(0)<1	プロセス	稼働中のhttpdプロセスの最新個数が1未満だった場合
14	重度の障害	bizstoresup process is not running on {HOST.NAME}	proc.num[,root,,"/usr/bin/bizstoresup"].last(0)<2	プロセス	稼働中のbizstoresupプロセスの最新個数が2未満だった場合
15	重度の障害	sagentd process is not running on {HOST.NAME}	proc.num[,root,,"/usr/bin/sagentd"].last(0)<1	プロセス	稼働中のsagentdプロセスの最新個数が1未満だった場合
16	重度の障害	sagentd/core process is not running on {HOST.NAME}	proc.num[,root,,"sagentd/core"].last(0)<1	プロセス	稼働中のsagentd/coreプロセスの最新個数が1未満だった場合
17	重度の障害	sagentd/heartbeat process is not running on {HOST.NAME}	proc.num[,root,,"sagentd/heartbeat"].last(0)<1	プロセス	稼働中のsagentd/heartbeatプロセスの最新個数が1未満だった場合
18	重度の障害	sagentd/poll process is not running on {HOST.NAME}	proc.num[,root,,"sagentd/poll"].last(0)}<1	プロセス	稼働中のsagentd/pollプロセスの最新個数が1未満だった場合
19	重度の障害	sagentd/scheduler process is not running on {HOST.NAME}	proc.num[,root,,"sagentd/scheduler"].last(0)<1	プロセス	稼働中のsagentd/schedulerプロセスの最新個数が1未満だった場合
20	重度の障害	sagentd/webclient process is not running on {HOST.NAME}	proc.num[,root,,"sagentd/webclient"].last(0)<1	プロセス	稼働中のsagentd/webclientプロセスの最新個数が1未満だった場合
21	重度の障害	sagentd/webserver process is not running on {HOST.NAME}	proc.num[,root,,"sagentd/webserver"].last(0)<1	プロセス	稼働中のsagentd/webserverプロセスの最新個数が1未満だった場合
22	重度の障害	supv2 process is not running on {HOST.NAME}	proc.num[,root,,"/usr/bin/supv2"].max(#10)<1	プロセス	直近10回の稼働中のsupv2プロセスの最大個数が1未満だった場合
23	重度の障害	supv2/core process is not running on {HOST.NAME}	proc.num[,root,,"supv2/core"].last(0)}<5	プロセス	稼働中のsupv2/coreプロセスの最新個数が5未満だった場合
24	重度の障害	supv2/scheduler process is not running on {HOST.NAME}	proc.num[,root,,"supv2/scheduler"].last(0)}<1	プロセス	稼働中のsupv2/schedulerプロセスの最新個数が1未満だった場合
25	重度の障害	supv2/webclient process is not running on {HOST.NAME}	proc.num[,root,,"supv2/webclient"].last(0)}<1	プロセス	稼働中のsupv2/webclientプロセスの最新個数が1未満だった場合
26	重度の障害	supv2/webserver process is not running on {HOST.NAME}	proc.num[,root,,"supv2/webserver"].last(0)}<5	プロセス	稼働中のsupv2/webserverプロセスの最新個数が5未満だった場合
27	重度の障害	salt-master process is not running on {HOST.NAME}	proc.num[,root,,"salt-master"].last(0)<12	プロセス	稼働中のsalt-masterプロセスの最新個数が12未満だった場合
28	重度の障害	salt-minion process is not running on {HOST.NAME}	proc.num[,root,,"salt-minion"].last(0)<3	プロセス	稼働中のsalt-minionプロセスの最新個数が3未満だった場合
29	重度の障害	uwsgi process is not running on {HOST.NAME}	proc.num[,,,"/usr/sbin/uwsgi"].last(0)<12	プロセス	稼働中のuwsgiプロセスの最新個数が12未満だった場合
30	重度の障害	grafana-server process is not running on {HOST.NAME}	proc.num[,grafana,,"/usr/sbin/grafana-server"].last(0)<1	プロセス	稼働中のgrafana-serverプロセスの最新個数が1未満だった場合
31	重度の障害	Number of detected a offline Nodes	custom.scality.supervisor.ringsh.serverlist.offline.max(#3)<>0	通信	直近3回の認識しているオフライン状態のノードの最大数が0以外だった場合
32	重度の障害	Supervisor detected error log in cluster.log on {HOST.NAME}	log[/var/log/scality-supervisor/cluster.log,,,,skip].str("ERROR")=1 and log[/var/log/scality-supervisor/cluster.log,,,,skip].count(1h,"ERROR")>4 and log[/var/log/scality-supervisor/cluster.log,,,,skip].nodata(1m)=0	ログ	文字列"ERROR"が検知された場合および文字列"ERROR"の個数が1時間に4個以上だった場合および60秒以内のデータの受信数が0個だった場合

コンフィグの設置

/etc/zabbix/zabbix_agentd.d/userparameter_scality.conf

# Custom Monitoring Items
UserParameter=custom.scality.supervisor.ringsh.ringstatus.meta.ssd.ng,/usr/bin/ringsh supervisor ringStatus META | grep 'ssd' | grep 'NG' | wc -l
UserParameter=custom.scality.supervisor.ringsh.ringstatus.data.disk.ng,/usr/bin/ringsh supervisor ringStatus DATA | grep 'disk' | grep 'NG' | wc -l
UserParameter=custom.scality.supervisor.ringsh.serverlist.online,/usr/bin/ringsh supervisor serverList | grep 'online' | wc -l
UserParameter=custom.scality.supervisor.ringsh.serverlist.offline,/usr/bin/ringsh supervisor serverList | grep 'offline' | wc -l
UserParameter=custom.scality.supervisor.ringsh.ringstorage.data.disk.avail,/usr/bin/ringsh supervisor ringStorage DATA | grep 'Disk avail' | cut -d ' ' -f 4 | awk '{ printf("%d",$1 + 0.5) }'
UserParameter=custom.scality.supervisor.ringsh.ringstorage.data.disk.used,/usr/bin/ringsh supervisor ringStorage DATA | grep 'Disk used' | cut -d ' ' -f 4 | awk '{ printf("%d",$1 + 0.5) }'

zabbix-agentの再起動（CentOS 5, 6）

[root@localhost ~]# /etc/init.d/zabbix-agent restart

zabbix-agentの再起動（CentOS 7）

[root@localhost ~]# systemctl restart zabbix-agent

Zabbix監視テンプレート（Scality 6 RING）
Zabbix監視テンプレート（Scality 6 SOFS Connector）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up