Prometheus

prometheus

Last updated at 2024-05-24Posted at 2024-05-22

Prometheus とかいう監視ツールを試します。
Getting started | Prometheus をやった上での独り言です。

ざっくり使い方

まず適当に設定ファイル prometheus.yml を作って:

global:
  scrape_interval: 15s # 全体設定として 15 秒間隔で監視
  external_labels:
    monitor: 'codelab-monitor'

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s # ここだけの設定として 5 秒間隔で監視
    static_configs:
      - targets: ['localhost:9090'] # Prometheus 自体を監視する

以下のように Docker で Prometheus を起動します。

docker run --rm -p 9090:9090 -v ./prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

http://localhost:9090/ で起動を確認できます。

これで、Prometheus が自分自身を監視している状況になります。Prometheus が監視元に提供している統計情報は http://localhost:9090/metrics で確認できます。

http://localhost:9090/graph でクエリに prometheus_target_interval_length_seconds と入れると。実際の監視間隔が表示されます。また、タブを Table から Graph に変えると、数値ではなくグラフで表示されます。

prometheus_target_interval_length_seconds{instance="localhost:9090", interval="5s", job="prometheus", quantile="0.01"} 4.995322586
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="5s", job="prometheus", quantile="0.05"} 4.9957928769999995
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="5s", job="prometheus", quantile="0.5"} 5.000041253
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="5s", job="prometheus", quantile="0.9"} 5.003112086
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="5s", job="prometheus", quantile="0.99"} 5.004889794

ここで、quantile の違う値が5つも表示されるので戸惑います。quantile というのは、パーセンタイルとも言って、低い方から順に並べた時の位置を示すもの(SDスコアとパーセンタイル)らしいです。つまり、一番上の quantile="0.01" の行は、1% の結果が 4.995322586 秒以下だった事(ばらつきが少ない)事を示します。

という事は何回か測った結果をまとめて quantile を出しているはずですが、何回測った結果なのかはよく分かりませんでした。すみません。

クエリには他にこんな書き方もあります。

prometheus_target_interval_length_seconds{quantile="0.99"}: 99パーセンタイルだけを表示
count(prometheus_target_interval_length_seconds): タイムシリーズの数を表示。この場合 quantile が 5 通りあるので 5 になります。

試しに他のクエリも試してみます。prometheus_http_requests_total を試すと、HTTP リクエストの累積アクセス数が出ます。

prometheus_http_requests_total{code="200", handler="/graph", instance="localhost:9090", job="prometheus"} 20
prometheus_http_requests_total{code="200", handler="/manifest.json", instance="localhost:9090", job="prometheus"} 20
prometheus_http_requests_total{code="200", handler="/metrics", instance="localhost:9090", job="prometheus"} 2526

用語

Time series: Prometheus の扱うデータです。計測値の時系列データの事です。
Metric name: 上の例で prometheus_target_interval_length_seconds のような計測値の名前の部分です。
Metric label: 上の例で instance や interval のような Metric name の細かい種類です。この例では quantile の違いによって 5 種類の prometheus_target_interval_length_seconds が計測されています。
Samples: 計測された量です。一つの量は float64 の値とミリセカンド精度の時刻がついています。
Target: 計測対象です。以下で target を追加してみます。

つまり、ある Time series のある時刻には次のようなデータがある事になります。

metric_name{label1="hoge", label2="fuga", label3="hage"} 2.718

Target の追加

Target 追加の例として、node-exporter というのを雑に追加してみます(コンテナの中を計測してしまうので実用には注意する事とあります)。

docker run -p 9100:9100 quay.io/prometheus/node-exporter:latest

先ほどの prometheus.yml に node-exporter を監視する設定を追加します。

global:
  scrape_interval: 15s # 全体設定として 15 秒間隔で監視
  external_labels:
    monitor: 'codelab-monitor'

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s # ここだけの設定として 5 秒間隔で監視
    static_configs:
      - targets: ['localhost:9090'] # Prometheus 自体を監視する

  - job_name:       'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['host.docker.internal:9100'] # node-exporter を監視する
        labels:
          group: 'production' # group="production" という Metric label を付加する

Prometheus を再起動して、たとえば node_cpu_seconds_total を問い合わせると追加した node-exporter を監視している事が分かります。

うまく Target の監視に成功しているかどうかは http://localhost:9090/targets で確認できます。

参考

Data model | Prometheus
Getting started | Prometheus をやった上での独り言です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up