More than 1 year has passed since last update.

Prometheusでnode-exporterから1s間隔でメトリックを取得する

Last updated at 2024-01-22Posted at 2023-12-28

はじめに

本記事ではOpenShift上で稼働するPrometheusの設定を変更し、node-exporterから1s間隔でメトリックを取得するようにする方法を説明します。

OpenShift v4.14では、cluster-monitoring-operatorがOpenShiftによって構成され、自動でPrometheusなどによるモニタリングスタックが構成されます。

その際Promtheusのモニタリング間隔を自前で設定したい場合に詰まってしまったので、その手順を本記事では記載します。

OpenShiftのモニタリングスタックのより詳しい情報はこちらの公式ドキュメントを参照ください。

注意点

本手順では、OpenShiftのクラスター設定をマニュアルで変更するため、RedHatサポートの対象外になります。
メトリックを1sのような短い間隔で取得すると、OpenShiftクラスターに通常よりも負荷がかかります。
Prometheusのデータ保持期間や、割り当てるリソース量の変更方法は公式ドキュメント内に記載があります。

前提

OpenShift v4.14

手順

OpenShiftクラスターへのログイン

ClusterVersionリソースの変更

OpenShiftのモニタリングスタックを管理している2つのOperatorを「Unmanaged」に変更します。

unmanagedにするのは、cluster-monitoring-operatorとprometheus-operatorの2つのOperatorです。

# cluster versionの編集
oc edit clusterversion

apiVersion: v1
items:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    metadata:
        ...
    spec:
      channel: stable-4.12
      clusterID: xxxxx
      overrides: # 追加 ここから
        - group: apps　
          kind: Deployment
          name: cluster-monitoring-operator
          namespace: openshift-monitoring
          unmanaged: true
        - group: apps
          kind: Deployment
          name: prometheus-operator
          namespace: openshift-monitoring
          unmanaged: true # 追加　ここまで
    status:
      availableUpdates:
        - channels:
            - candidate-4.12
            - candidate-4.13
            - eus-4.12
            - fast-4.12
...

マニフェストファイルのoverridesセクションを追加します。

Operatorのスケールダウン

unmanagedにした2つのOperatorをスケールダウンします。

# cluster-monitoring-operatorのスケールダウン
oc -n openshift-monitoring scale deployments cluster-monitoring-operator --replicas=0

# prometheus-operatorのスケールダウン
oc -n openshift-monitoring scale deployments prometheus-operator --replicas=0

これらのOperatorをスケールダウンしておかないと、後述する編集を行なってもこれらのOperatorに元に戻されてしまいます。

Prometheusの設定ファイルの修正

Prometheusの設定ファイルは、secret/prometheus-k8sgz形式で格納されています。

このシークレット内にあるファイルを更新することで、Prometheusの設定ファイルの更新を行います。


# 設定ファイルのバックアップを取得
oc get -n openshift-monitoring secrets prometheus-k8s -o jsonpath='{.data.prometheus\.yaml\.gz}'  | base64 -d | gunzip -c > ./tmp/prometheus.yaml

# ファイルを編集 (node-exporterのjobのscrape_intervalを1sに変更)
vi ./prometheus.yaml

...
- job_name: serviceMonitor/openshift-monitoring/node-exporter/0
  scrape_interval: 1s # ここの値を編集
...

# ファイルを圧縮して書き出し
cat ./prometheus.yaml | gzip | base64 -w 0

# secretの更新 (prometheus.yaml.gzの値をファイル修正後に圧縮した値に更新する)
oc -n openshift-monitoring edit secrets prometheus-k8s

修正した設定ファイルが反映されていることを確認

Prometheus podを再起動する必要はありません。

先ほど編集した内容がpod内の設定ファイルにも反映されていればOKです。

# 編集したsecretがpodに正しく反映されていることを確認する
oc -n openshift-monitoring debug pod/prometheus-k8s-0 -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep 'node-exporter/0'  -A 7

# 編集したsecretがpodに正しく反映されていることを確認する
oc -n openshift-monitoring debug pod/prometheus-k8s-1 -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep 'node-exporter/0'  -A 7

ServiceMonitorの設定変更

次はnode-exporterのServiceMonitorの設定値を編集します。

# ServiceMonitorの編集
oc edit servicemonitor/node-exporter -n openshift-monitoring

spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    interval: 1s # ここ

1s間隔でメトリックが取得できていることの確認

# Prometheusサービスにポートフォワードを行う
oc port-forward -n openshift-monitoring svc/prometheus-k8s 9091:9091

# 認証トークンの確認
oc whoami -t

# APIリクエスト(日時は環境に合わせて指定してください。)
curl -ks 'https://localhost:9091/api/v1/query_range' \
  -H 'Authorization: Bearer <oc whoami -tで取得した認証トークンに置き換えてください>' \
  --data-urlencode "query=node_cpu_seconds_total" \
  --data-urlencode 'start=2024-01-22T02:45:00Z' \
  --data-urlencode "end=2024-01-22T02:46:00Z" \
  --data-urlencode "step=1s"

APIレスポンスから1s単位でメトリックが変化していることが確認できればOKです。

以上でPrometheusがnode-exporterからメトリックを1s間隔で取得するように変更できました。

参考

本手順は以下の記事を参考に記述しました。

https://www.redhat.com/en/blog/openshift-monitoring-stack-playing-with-prometheus-performance-and-scraping-intervals
- Hacking the monitoring stack to change scrap intervals」のセクション

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up