More than 1 year has passed since last update.

Grafana Agent を用いた Continuous Profiling (golang pull編)

Last updated at 2023-12-08Posted at 2023-12-07

はじめに

Grafana Pyroscopeという継続的にプロファイル(Continuous Profiling)の情報を蓄積して活用するためのプロダクトがあります。このプロダクトにプロファイルの情報を蓄積する方法として以下の図にあるようにPyroscope SDKとGrafana Agentを用いる方法が提供されています。

ref https://grafana.com/docs/pyroscope/latest/configure-client/#sending-profiles-from-your-application

元々は単独でPyroscopeというプロジェクトでしたが、Grafana LabsのプロダクトとなったときにGrafana Phlareと統合されて現在のGrafana Pyrscopeとなりました。

Grafana Pyrscopeでプロファイル情報を集めるやり方としては専用のSDK(Pyroscope SDK)を利用するのが基本的な使い方になります。その際には対象のアプリケーションに変更を加える必要があります。

そこで、対象のアプリケーションのソースコードに変更を加えることなくGrafana Agentを用いてプロファイルを収集する機能(Auto-Instrumentation)が追加されました。

今回は対象のアプリケーションに対してソースコードの追加をせずにプロファイル情報を収集することができるGrafana Agentを用いた方法について説明していきます。

紹介する機能は現時点では少し実験的な印象を受ける部分もありますが、状況によっては使える場合もあるのかなと個人的には思っています。

Grafana Agentを用いたプロファイルの収集

eBPF
- 収集対象はCPUプロファイリングのみ
- BPF_PROG_TYPE_PERF_EVENTが有効になっているkernel version >= 4.9で利用可能
- Python、Ruby、JavaScriptのインタプリタ言語の情報も収集できるが、取得できる情報が限定的
golang pull
- 収集対象はGo言語のみ
- Grafana Agentからpprofのエンドポイントへ接続できる場合に利用可能
- 取得対象はpprofの情報になるので、CPU以外にもメモリ等の情報も収集可能

この記事では以降でgolang pullを用いた場合の方法について説明していきます。
eBPFについてはこちらの記事で紹介しています。

Grafana Agentの設定方法

動作要件

Grafana AgentからGo言語で書かれたアプリケーションのpprofのエンドポイントに接続できること

eBPFを用いた方法と比べて要件が難しくないので導入はしやすいかなと思います。

Kubernetes上で動かしてみる

確認したバージョン
- Grafana Agent: v0.38.0
- Grafana Pyroscope: 1.2.0
pprofの情報の収集イメージ

Grafana Agetntの設定について説明する前に、今回はどういう条件でGoのアプリケーションからpprofの情報をGrafana Agentで収集していくのかについて話をします。

スクレイプの設定についてはそれなりに自由度があるのでいろんな方法で設定できますが、今回は以下のようなアノテーションがPodについているときに対象のpprofの情報を収集して、Grafana Pyroscopeに蓄積していくことを考えます。

annotations:
    profiles.grafana.com/cpu.scrape: "true"
    profiles.grafana.com/goroutine.scrape: "true"
    profiles.grafana.com/memory.scrape: "true"
    profiles.grafana.com/fgprof.scrape: "true"
    profiles.grafana.com/block.scrape: "true"
    profiles.grafana.com/mutex.scrape: "true"

いろんな設定方法が考えられる中で上記についてのやり方を説明する理由は2点になります。

Grafana Labs社で管理されてるHelmのテンプレート等でこういう書き方が見られる
Grafana Agentのスクレイプに関する部分の参考になっているPrometheusで似たような考え方で利用されているケースがそれなりに多いのと、Grafana Agentの該当機能の利用が仮に活発になったと想定した場合にも同じような設定方法で運用するケースが多くなるかなと個人的に予想している

なので、PodかServiceに該当のアノテーションが付与されているものをスクレイプ対象としてpprofの情報を収集していくような運用の仕方が多くなるのかなと思っています。

Grafana Agentの設定ファイル

## 収集した　pprof の情報を書き込む Grafana Pyroscope の接続先を設定します
pyroscope.write "pyroscope_write" {
  endpoint {
    url = "http://pyroscope.observability.svc.cluster.local.:4040"
  }
}

## Pod の情報を収集するようにここで設定します
discovery.kubernetes "pyroscope_kubernetes" {
  role = "pod"
}

## 収集した pprof の情報に対して、Pod 名などの Kubernetes 上のメタ情報を加工してラベルの中に追加したりします
## ここで Pod 名とか Container 名の情報を追加しておかないと、せっかく集めた pprof の情報が解析とかに使えなくなるので実施します
discovery.relabel "kubernetes_pods" {
  targets = concat(discovery.kubernetes.pyroscope_kubernetes.targets)

  rule {
    action        = "drop"
    source_labels = ["__meta_kubernetes_pod_phase"]
    regex         = "Pending|Succeeded|Failed|Completed"
  }

  rule {
    action = "labelmap"
    regex  = "__meta_kubernetes_pod_label_(.+)"
  }

  rule {
    action        = "replace"
    source_labels = ["__meta_kubernetes_namespace"]
    target_label  = "namespace"
  }

  rule {
    action        = "replace"
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }

  rule {
    action        = "replace"
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label  = "container"
  }
}

## ここで Pod のアノテーションに対応してスクレイプするようにする処理を追加します
## この例は`profiles.grafana.com/memory.scrape`とかに対応した処理になります
## あんまり説明が書かれてないので初めての方にはここがすごく難しいんですが、Grafana Agent上では`__meta_`に`.`や`/`が`_`に置換された状態でアノテーションとかの情報が入っていて、こういう形で操作できます
discovery.relabel "kubernetes_pods_memory_custom_name" {
  targets = concat(discovery.relabel.kubernetes_pods.output)

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_scrape"]
    action        = "keep"
    regex         = "true"
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_port_name"]
    action        = "drop"
    regex         = ""
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_container_port_name"]
    target_label  = "__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_port_name"
    action        = "keepequal"
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_scheme"]
    action        = "replace"
    regex         = "(https?)"
    target_label  = "__scheme__"
    replacement   = "$1"
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_path"]
    action        = "replace"
    regex         = "(.+)"
    target_label  = "__profile_path__"
    replacement   = "$1"
  }

  rule {
    source_labels = ["__address__", "__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_port"]
    action        = "replace"
    regex         = "(.+?)(?::\\d+)?;(\\d+)"
    target_label  = "__address__"
    replacement   = "$1:$2"
  }
}

## 最後に今まで設定してきた Pod の条件と書き込み先の Grafana Pyroscope の情報をマッピングします
## このタイミングで pprof の上のうちにどの情報をスクレイプするかを設定するのかを設定します
## ここで memory に対してのみ `ture` に設定してるので、アノテーションで profiles.grafana.com/memory.scrape: "true" で設定している Pod の memory の情報だけを収集するようになります
pyroscope.scrape "pyroscope_scrape_memory" {
  clustering {
    enabled = true
  }

  targets    = concat(discovery.relabel.kubernetes_pods_memory_default_name.output, discovery.relabel.kubernetes_pods_memory_custom_name.output)
  forward_to = [pyroscope.write.pyroscope_write.receiver]

  profiling_config {
    profile.memory {
      enabled = true
    }

    profile.process_cpu {
      enabled = false
    }

    profile.goroutine {
      enabled = false
    }

    profile.block {
      enabled = false
    }

    profile.mutex {
      enabled = false
    }

    profile.fgprof {
      enabled = false
    }
  }
}

Kubernetesのマニフェスト

対象のアプリケーションのpprofのエンドポイントに対してGrafana Agentが接続できれば情報は収集できるので今回はDeploymentを使用しています。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: grafana-agent-pprof
  name: pyroscope-pprof-grafana-agent
  namespace: observability
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: grafana-agent-pprof
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: grafana-agent-pprof
    spec:
      automountServiceAccountToken: true
      containers:
      - args:
        - run
        - /etc/agent/config.river
        - --storage.path=/tmp/agent
        - --server.http.listen-addr=0.0.0.0:80
        env:
        - name: AGENT_MODE
          value: flow
        image: docker.io/grafana/agent:v0.38.0
        imagePullPolicy: IfNotPresent
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 80
          initialDelaySeconds: 10
          timeoutSeconds: 1
        name: grafana-agent
        ports:
        - containerPort: 80
          name: http-metrics
        volumeMounts:
        - mountPath: /etc/agent
          name: config
      serviceAccountName: pyroscope-pprof-grafana-agent
      volumes:
      - configMap:
          name: pyroscope-pprof-grafana-agent
        name: config

Grafana PyroscopeでのCPUプロファイルの確認

$Single___process_cpu_cpu_nanoseconds_cpu_nanoseconds{service_name__observability_pyroscope_}___Pyroscope_と___work_daily_2023.png$

Grafana Pyroscopeは独自のWebUIを持っているので、Grafana Loki等とは異なり、Grafanaとかを経由して参照しなくてもこんな形で情報を参照することができます。

対象のPodや時刻を指定てFlame Graphで情報を参照する形になります。一度Grafana AgentとGrafana Pyroscopeを設定しておくとあと遡ってpprofの情報を確認できるので開発環境とかおいておくと結構便利です。

pprofで使える通常のツリー形式のようなGraphでの表示が現状のGrafana Pyroscopeで見れないので、そのあたりはちょっと不便に思ったりしますが、あったら便利なプロダクトかなと思います。

所感

既にGoのアプリケーションでpprofを使っている場合は、既存のアプリケーションに対してコードを追加することなく情報を収集して蓄積できるので開発環境とかでちょっと試してみるのにはすごくいいツールだと思います。

スクレイプに関する設定方法とかが独特で初見ではすごく分かりにくいので、実際の運用を考えるとメンバーが入れ替わっていく中でこの設定内容を引き継いでいくというのが結構大変かなと思います。

Prometheusでも同じ運用課題はよくありますが、特にGrafana Agentの場合はさらに情報が少なく、公式ドキュメントに記載されてないことが多々あるので、Prometheusを元々触ってた人でないと今は設定ファイルを書くのがかなり難しいです。

個人的には結構推しのツールなので、皆さんを試してみてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up