More than 5 years have passed since last update.

Amazon Web Services / プロフェッショナルサービス

PrometheusからPodのリソース使用量のデータをPythonでcsvにエクスポートする

Last updated at 2019-01-30Posted at 2019-01-16

IBM Cloud Privateに同梱のPrometheusから、データをPythonでcsvにエクスポートしてみたメモ。
古き良きシステムで、Pod毎のCPU使用率などの性能情報レポートを、エクセルで作成して報告したい場合を想定。

ICP v3.1.0で確認。Macでコマンドを実行しているため、Linuxではcurlやdateなどの挙動が若干違うかもしれない。

Prometheusクエリー

この記事ではPrometheusクエリーについては触れないが、以下のリンクを見ておくことがおすすめ。

Prometheusクエリ道場

例えば、NamespaceのPod毎のCPU使用率を取得するには以下のようなクエリーが使われる。

sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod_name) * 100

このクエリーは順を追ってみていくと以下のようになっている。データの型とrate()関数がわかりにくいけれどもポイント。

container_cpu_usage_seconds_totalはコンテナが使用したCPU時間の積算値で、このクエリーは複数のコンテナのあるタイムスタンプにおけるデータが返ってくる（Instant vector型のデータ）
container_cpu_usage_seconds_total{namespace="default"}のように、ラベルでデータを絞ることができる（Instant vector型のデータ）
container_cpu_usage_seconds_total{namespace="default"}[5m]とすることで、データの期間を指定していて、複数のコンテナの複数のタイムスタンプに渡るデータが返ってくる（Range vector型のデータ）
rate(container_cpu_usage_seconds_total{namespace="default"}[5m])とすることで、指定期間の1秒当たりの平均の増分を出している（Instant vector型のデータ）
- 1秒当たりCPU使用時間が0.5秒増えていたとすると、CPU使用率は0.5（50%）ということになる
sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod_name)で複数のデータをPod毎に合計している（Instant vector型のデータ）
- Pod内には複数のコンテナがいる場合があるので、Pod毎にこのような合計をしている

Grafanaからのエクスポート

Grafanaダッシュボードから、パネル単位でのcsvエクスポートがGUIで可能。ダッシュボードで、データの期間と、ダッシュボードによっては設定されている変数（以下の例ではintervalとNamespace）を指定して欲しいデータが表示されている状態にした後、パネルのタイトル部分をクリックすることで、csvでのエクスポートができる。すでにブラウザにダウンロード済みのjsonデータをブラウザ上のJavaScriptがcsvに変換してくれているっぽい。

エクスポート時にはいくつかの設定も可能。

curl

続いてcurlでデータを取得してみる。

認証

Prometheus APIにアクセスするにはトークンが必要となる。

IDトークンが必要と思いきや、Prometheus APIへのアクセスに必要なのはIDトークンではなくアクセス・トークンのほうなので注意。

トークンはローカルでcloudctl login済みであればcloudctl tokensで取得できる。

ACCESS_TOKEN=$(LANG=C cloudctl tokens | grep "Access token:" | awk '{print ($4)}')

またはユーザーとパスワードを渡してAPIから取得することもできる。

USERNAME="admin"
PASSWORD="admin"
ACCESS_TOKEN=$(curl -s -k -H "Content-Type: application/x-www-form-urlencoded;charset=UTF-8" \
  -d "grant_type=password&username=${USERNAME}&password=${PASSWORD}&scope=openid" \
  https://mycluster.icp:8443/idprovider/v1/auth/identitytoken | jq -r '.access_token')

クエリーの確認

GrafanaのパネルのタイトルでEditをクリックすることで、どのようなPrometheusクエリーが使われているかが確認できる。$intervalと$namespaceはダッシュボードに定義している変数の部品から渡される。

また、Query Inspectorを開くと、どのようなHTTPリクエストなのかも確認できる。

GETリクエストのパラメータにはquery、start、end、stepを渡す必要がある。startとendはエポックタイムで、stepはデータポイントの間隔。

curlの実行

curlでのリクエストは以下のように行う。-Hオプションで認証のためのヘッダーをつける。--data-urlencoldeでデータをエンコードさせ、-GオプションをつけることでPOSTではなくGETさせる。

curl -k -s -G -H "Authorization:Bearer $ACCESS_TOKEN" \
  https://mycluster.icp:8443/prometheus/api/v1/query_range \
  --data-urlencode 'query=sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod_name) * 100' \
  --data-urlencode "start=1547517120" \
  --data-urlencode "end=1547527950" \
  --data-urlencode "step=30" | jq .

パラメータを変数にして前に出すと以下。

NAMESPACE="default"
INTERVAL="5m"
QUERY="sum(rate(container_cpu_usage_seconds_total{namespace=\"${NAMESPACE}\"}[${INTERVAL}])) by (pod_name) * 100"
START=$(date -v -1d +%s)  # 1日前の時刻をエポックタイムで取得
END=$(date +%s)           # 今の時刻をエポックタイムで取得
STEP=30
curl -k -s -G -H "Authorization:Bearer $ACCESS_TOKEN" \
  https://mycluster.icp:8443/prometheus/api/v1/query_range \
  --data-urlencode "query=${QUERY}" \
  --data-urlencode "start=${START}" \
  --data-urlencode "end=${END}" \
  --data-urlencode "step=${STEP}" | jq .

実行すると以下のようにjsonで返ってくる。

$ curl -k -s -G -H "Authorization:Bearer $ACCESS_TOKEN" \
>   https://mycluster.icp:8443/prometheus/api/v1/query_range \
>   --data-urlencode "query=${QUERY}" \
>   --data-urlencode "start=${START}" \
>   --data-urlencode "end=${END}" \
>   --data-urlencode "step=${STEP}" | jq .
{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "pod_name": "infra-test-nodeport-cust-0"
        },
        "values": [
          [
            1547537972,
            "2.5491993487228775"
          ],
          [
            1547538002,
            "2.300661640484626"
          ],
（省略）

Python

jsonをcsvにするのはシェルスクリプトではちょっと大変なので、Pythonを使ってやってみる。

コード作成

以下のような手続き的なコードを作成。よいコードの書き方はまだわからないので「退屈なことはPythonにやらせよう」な思想。

pod_cpu_exporter.py

import argparse
import collections
import csv
import datetime
import logging
import re
import subprocess

import pprint
import requests
import urllib3


formatter = '%(asctime)s %(name)-12s %(levelname)-8s %(message)s'
logging.basicConfig(level=logging.WARNING, format=formatter)
logger = logging.getLogger(__name__)


# 警告を非表示にする
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


# コマンド引数の処理
parser = argparse.ArgumentParser(description='PodのCPU使用率をcsvに出力します。')
parser.add_argument('-f', '--filename',
                    action='store',
                    type=str,
                    help='出力先のファイル名を指定します')
parser.add_argument('-n', '--namespace',
                    action='store',
                    type=str,
                    default='default',
                    help='Namespaceを指定します')
parser.add_argument('--interval',
                    action='store',
                    type=str,
                    default='5m',
                    help='CPU使用率計算に使用するデータの間隔を指定します（例）1h、5m')
parser.add_argument('--start',
                    action='store',
                    type=str,
                    help='データの開始時間を指定します（例）20190101-1000')
parser.add_argument('--end',
                    action='store',
                    type=str,
                    help='データの終了時間を指定します（例）20190102-1000')
parser.add_argument('--step',
                    action='store',
                    type=int,
                    help='データポイントの間隔（秒）を指定します')
args = parser.parse_args()

filepath = args.filename
namespace = args.namespace
interval = args.interval
step = args.step
logger.debug('filepath: {}'.format(filepath))
logger.debug('interval: {}'.format(interval))
logger.debug('step: {}'.format(step))

# 引数の開始時間と終了時間をUNIX時刻に変換
start_str = args.start
end_str = args.end
start_dt = datetime.datetime.strptime(start_str, '%Y%m%d-%H%M')
end_dt = datetime.datetime.strptime(end_str, '%Y%m%d-%H%M')
start_unix = start_dt.timestamp()
end_unix = end_dt.timestamp()
logger.debug('start_dt: {}'.format(start_dt))
logger.debug('end_dt: {}'.format(end_dt))
logger.debug('start_unix: {}'.format(start_unix))
logger.debug('end_unix: {}'.format(end_unix))

# サブプロセスでコマンドを実行し、結果からアクセストークンを抽出
completed_process = subprocess.run(['cloudctl', 'tokens'], stdout=subprocess.PIPE)
result_str = completed_process.stdout.decode('utf-8')
match = re.search(r'(.*)\s+Bearer\s+(.*)', result_str)
access_token = (match.group(2))
logger.debug('access_token: {}'.format(access_token))

# Prometheusクエリー
# 指定のNamespaceの、指定のintervalで算出したPod毎のCPU使用率を取得する
# sum(rate(container_cpu_usage_seconds_total{namespace="$namespace"}[$interval])) by (pod_name) * 100
query = 'sum(rate(container_cpu_usage_seconds_total{{namespace="{}"}}[{}])) ' \
        'by (pod_name) * 100'.format(namespace, interval)
logger.debug('query: {}'.format(query))

# リクエスト
url = 'https://mycluster.icp:8443/prometheus/api/v1/query_range'
headers = {'Authorization': 'Bearer {}'.format(access_token)}
params = {'query': query,
          'start': start_unix,
          'end': end_unix,
          'step': step}
logger.debug('url: {}'.format(url))
logger.debug('headers: {}'.format(headers))
logger.debug('params: {}'.format(params))


# リクエストを実行
response = requests.get(url, verify=False, headers=headers, params=params)
response.raise_for_status()
logger.debug('response: {}'.format(response))

# レスポンスは以下のようなデータ
# pprint.pprint(response.json())
# {'data': {'result': [{'metric': {'pod_name': 'infra-test-nodeport-cust-0'},
#                       'values': [[1547528400, '2.64939279124293'],
#                                  [1547532000, '2.5820633706497045'],
#                                  [1547535600, '2.562417181158173'],
#                                  [1547539200, '2.4563804665536724'],

# 意味のない部分を取り除いて中のリストを取り出す
results = response.json()['data']['result']

# 取り出したのは以下のようなデータ
# pprint.pprint(results)
# [{'metric': {'pod_name': 'infra-test-nodeport-cust-0'},
#   'values': [[1547528400, '2.64939279124293'],
#              [1547532000, '2.5820633706497045'],
#              [1547535600, '2.562417181158173'],
#              [1547539200, '2.4563804665536724'],
#
# このデータを時刻をキーにして以下のような辞書にまとめる
#
# {1547464889.632: {'infra-test-nodeport-cust-0': '3.1518179124293577',
#                   'infra-test-nodeport-cust-1': '1.530811175762711',
#                   'infra-test-nodeport2-cus-0': '3.0063879859887037',
#                   'infra-test-nodeport2-cus-1': '1.5241500936723127'},
#  1547468489.632: {'infra-test-nodeport-cust-0': '3.161739384943495',
#                   'infra-test-nodeport-cust-1': '1.5393470943785368',
#                   'infra-test-nodeport2-cus-0': '2.8831145322598943',
#                   'infra-test-nodeport2-cus-1': '1.578976048757047'},

# 時刻毎のデータの辞書を用意する
time_series = collections.defaultdict(dict)

# Pod名のSetを用意する
pod_names = set()

for result in results:
    # Pod名を取り出してSetに入れておく
    pod_name = result['metric']['pod_name']
    pod_names.add(pod_name)
    for value in result['values']:
        # timestampを辞書のキーにすることで同じtimestampのデータをまとめる
        # defaultdictを使うことでキーがなくてもKeyErrorにならない
        time_series[value[0]][pod_name] = value[1]

# pprint.pprint(time_series)

# csvのヘッダーは時刻とPod名にする
fieldnames = ['timestamp']
fieldnames.extend(pod_names)

# csvファイルに保存する
with open(filepath, 'w') as csv_file:

    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()

    # 辞書から時間毎のデータを取り出してループする
    for timestamp, values in time_series.items():
        # 行に時間の列を追加
        row = {'timestamp': datetime.datetime.fromtimestamp(timestamp)}
        # valuesは以下のような辞書
        # {'infra-test-nodeport-cust-0': '2.467225521553685',
        #  'infra-test-nodeport-cust-1': '1.5932590068361583',
        #  'infra-test-nodeport2-cus-0': '2.2811341803954917',
        #  'infra-test-nodeport2-cus-1': '1.6517850743220521'},
        # 事前に格納したPod名のリストの方でループする
        for pod_name in pod_names:
            try:
                row[pod_name] = values[pod_name]
            except KeyError:
                # valuesにこのPodのデータがないときはKeyErrorが発生するので空データを入れる
                row[pod_name] = ''
        writer.writerow(row)

実行例

requestモジュールが必要。

pip install requests

サブプロセスでcloudctl tokenを実行してトークンを取得しているので、cloudctlコマンドが必要で、事前にログインしておく必要がある。

実行すると以下のようにcsvが取得できる。

$ python export_pod_cpu.py --help
usage: export_pod_cpu.py [-h] [-f FILENAME] [-n NAMESPACE]
                         [--interval INTERVAL] [--start START] [--end END]
                         [--step STEP]

PodのCPU使用率をcsvに出力します。

optional arguments:
  -h, --help            show this help message and exit
  -f FILENAME, --filename FILENAME
                        出力先のファイル名を指定します
  -n NAMESPACE, --namespace NAMESPACE
                        Namespaceを指定します
  --interval INTERVAL   CPU使用率計算に使用するデータの間隔を指定します（例）1h、5m
  --start START         データの開始時間を指定します（例）20190101-1000
  --end END             データの終了時間を指定します（例）20190102-1000
  --step STEP           データポイントの間隔（秒）を指定します
$ python export_pod_cpu.py -f test.csv -n default --interval 5m --start 20190115-1600 --end 20190116-1600 --step 600
$ cat test.csv
timestamp,infra-test-nodeport-cust-1,infra-test-nodeport2-cus-1,infra-test-nodeport2-cus-0,infra-test-nodeport-cust-0
2019-01-15 16:00:00,1.7765650066665255,1.5227466104164478,2.4642478804166026,2.7171226995831903
2019-01-15 16:10:00,1.4900129837500724,1.4548672362502657,2.3264455262498513,2.7998607379167124
2019-01-15 16:20:00,1.7629794254168016,1.6523557370834396,2.231762218333415,2.040916205000182
2019-01-15 16:30:00,1.87828389291667,1.8159218674998103,1.7364743683333472,2.4042730716670726
2019-01-15 16:40:00,1.6692312762499266,1.7964510670833533,2.264148609108464,2.300661640484626
2019-01-15 16:50:00,1.7293341049999826,1.7319731500000066,1.8862790512499334,2.091235875833111

（省略）

2019-01-16 15:00:00,1.4205530700000204,1.450829464166835,2.229421241249838,2.6200906879167483
2019-01-16 15:10:00,1.207286078750182,1.4032802437501837,1.8004293182339135,2.120460568511992
2019-01-16 15:20:00,1.2547518466667875,1.5286131908334255,1.8340893729164995,2.258688518749826
2019-01-16 15:30:00,1.2122690649999868,1.3929129920832868,2.2814286763049063,2.8766292608645045
2019-01-16 15:40:00,1.4557575473232511,1.2478161367336864,1.8957547083334705,2.535229864166316
2019-01-16 15:50:00,1.0475173487499962,1.2800040254167773,1.750940527916403,2.7946370483334704
2019-01-16 16:00:00,1.4323884024997824,1.3961431675001752,2.2708808758333516,2.626497514166885
$

TODO

日時をローカル時刻で指定してるが、UTCで指定できる方が便利かもしれない
CPUしかやってないのでメモリー使用量も必要
openpyxlモジュールとかを使ってエクセルにするところまでやってもよさそう

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up