blackbox_exporter + Prometheus + grafana でサーバをPing監視する #prometheus

はじめに

ローカルPCとのPing疎通が途絶えたらSlackにアラートメッセージを飛ばすようにしました
実際の運用では、サーバの死活監視をするような場面を想定しています

インフラ構成

blackbox_exporter
- ポートを開放して Prometheus から監視対象ノードに Ping (ICMP) 疎通確認ができるようにする
prometheus
- メトリクス情報の収集
grafana
- 可視化 + アラートの Slack 通知

実際に運用する場合には監視ノードと監視対象ノードを分離する構成になると思います。
しかしながら、今回は単純化するために、すべてローカルのMacbookに環境を構築しました。

環境構築

blackbox_exporterのインストール

Prometheusから監視対象ノードに向けてPing監視ができるようにするために、 blackbox_exporter をインストールします。

Prometheusのダウンロードページから Blackbox Expoter のバイナリをダウンロードして、適当な場所に展開します。
blackbox.yml に以下があることを確認します（デフォルトで記載されています）

modules:
  icmp:
    prober: icmp

  # IPV6 が利用できない環境の場合は以下の設定を追加します。
  icmp_ipv4:
    prober: icmp
    icmp:
      preferred_ip_protocol: ip4

展開先のディレクトリに移動し、blackbox_exporterを起動します sudo ./blackbox_exporter Socket にアクセスする必要があるため、 root/sudo の権限が必要です。
localhostにping疎通確認

$ curl "http://localhost:9115/probe?module=icmp&target=localhost"

# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.000636406
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.001000538
# HELP probe_icmp_duration_seconds Duration of icmp request by phase
# TYPE probe_icmp_duration_seconds gauge
probe_icmp_duration_seconds{phase="resolve"} 0.000636406
probe_icmp_duration_seconds{phase="rtt"} 0.000144149
probe_icmp_duration_seconds{phase="setup"} 7.2599e-05
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1

probe_success 1 と表示されれば成功。

prometheus のインストール

Prometheusのダウンロードページから prometheus のバイナリをダウンロードし、適当な場所に展開します。 i. mac の場合は brew でもインストールできます brew install prometheus
バイナリを展開した場所に移動し、 prometheus.yml を以下のように編集します

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
        - localhost
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115 # blackbox_expoter のエンドポイントを指定