Systemi（株式会社システムアイ）Advent Calendar 2024

【GitHub Actions】Actions Runner Controllerのメトリクスを可視化してみた

Last updated at 2024-12-20Posted at 2024-12-20

やること

Actions Runner Controller(以降、ARC)のmetricsを有効化して、Grafanaで可視化してみます。
ARCとは、GitHub Actions Self-hosted runnerを自動でスケーリングするための Kubernetes オペレーターです。

前提

Kubernetesクラスタ（Linux）が構築済み
- Azure以外のマネージドな Kubernetes サービスやOpenShift上での実行はサポートしていないようです（頑張れば動かないことはない）
ARCがインストール済み

私はローカル端末に構築したKubernetesクラスタで検証しました。

概要

OpenTelemetry Collector でarc-systems内のPodが持つmetrics取得用エンドポイントをScrapeして、Prometheusに送信します。 Prometheusに保存したmetricsをGrafanaで可視化します。
Namespaceは適宜読み替えてください。

ARCのメトリクスを有効化

ARCのメトリクスを有効化することで、ランナー数やワークフローの実行時間等のデータが Prometheus 形式で出力されるようになります。
公式ドキュメントにある通りgha-runner-scale-set-controllerの values.yml を用意してHelmChartを更新すればOKです。

Podへ通信できるようにするため、Serivceをデプロイします。

apiVersion: v1
kind: Service
metadata:
  name: controller-service
  namespace: arc-systems
spec:
  selector:
    app.kubernetes.io/instance: <gha-runner-scale-set-controllerのチャート名>
    app.kubernetes.io/name: gha-rs-controller
    app.kubernetes.io/namespace: arc-systems
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: listener-service
  namespace: arc-systems
spec:
  selector:
    actions.github.com/scale-set-name: <gha-runner-scale-setのチャート名>
    actions.github.com/scale-set-namespace: arc-runners
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 8080

ARCのメトリクスを保存

OpenTelemetry Collectorをクラスタにデプロイして、収集したARCのmetricsをPrometheusに送信します。

OpenTelemetry Collectorイメージのビルド

まずは、OpenTelemetry Collectorのコンテナイメージを作成します。公式ドキュメントを参考にDockerfileとbuilder-config.yamlを作成します。

Dockerfile

FROM golang:1.23-bullseye AS build
WORKDIR /app
RUN go install go.opentelemetry.io/collector/cmd/builder@v0.115.0
COPY builder-config.yaml .

ARG CGO_ENABLED 0
RUN CGO_ENABLED=${CGO_ENABLED} builder --config=builder-config.yaml

FROM gcr.io/distroless/static-debian12
WORKDIR /app
COPY --from=build --chown=nonroot:nonroot /app/otelcol-custom /app

USER nonroot

ENTRYPOINT ["/app/otelcol-custom"]
CMD ["--config", "/etc/otelcol/config.yml"]

Prometheus系のreceiver, exporterをインストールします。他のコンポーネントが使いたい場合は、OpenTelemetry Registryから検索するかGitHubリポジトリの中から探してみましょう。

builder-config.yaml

dist:
  name: otelcol-custom
  description: Custome OTel Collector distribution
  output_path: .

receivers:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.115.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusremotewritereceiver v0.115.0
processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.115.0
  - gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.115.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.115.0
exporters:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter v0.115.0

ビルドしたイメージをクラスタから疎通可能なイメージレジストリに格納してください。具体的にインターネットの公開されているDockerhub、GitHub Packages...に格納するか、ローカルで完結したい場合はDocker社が提供するregistryイメージを利用するのも手です。

OpenTelemetry Collectorのデプロイ

OpenTelemetry CollectorをKubernetesクラスタにデプロイします。

マニフェスト実装例

apiVersion: v1
kind: ServiceAccount
metadata:
  name: collector
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-collector
rules:
  - apiGroups:
      - ''
    resources:
      - events
      - namespaces
      - namespaces/status
      - nodes
      - nodes/spec
      - nodes/stats
      - pods
      - pods/status
      - replicationcontrollers
      - replicationcontrollers/status
      - resourcequotas
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - apps
    resources:
      - daemonsets
      - deployments
      - replicasets
      - statefulsets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - 'extensions'
    resources:
      - daemonsets
      - deployments
      - replicasets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - batch
    resources:
      - jobs
      - cronjobs
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector
subjects:
  - kind: ServiceAccount
    name: collector
    namespace: monitoring
roleRef:
  kind: ClusterRole
  name: otel-collector
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-cm
  namespace: monitoring
data:
  config.yml: |
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: arc-controller
              scrape_interval: 60s
              metrics_path: /metrics
              static_configs:
                - targets:
                    - controller-service.arc-systems.svc:8080
            - job_name: arc-listner
              scrape_interval: 60s
              metrics_path: /metrics
              static_configs:
                - targets:
                    - listener-service.arc-systems.svc:8080
    processors:
      batch:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 50
    exporters:
      prometheusremotewrite:
        endpoint: http://prometheus-server.monitoring.svc/api/v1/write
        tls:
          insecure: false
        resource_to_telemetry_conversion:
          enabled: true
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [memory_limiter, batch]
          exporters: [prometheusremotewrite]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-deployment
  namespace: monitoring
  labels:
    app: otel
spec:
  replicas: 1
  selector:
    matchLabels:
      app: otel
  template:
    metadata:
      labels:
        app: otel
    spec:
      serviceAccountName: collector
      containers:
      - name: otel-collector
        image: xxx/opentelemetry-collector:1.0.0
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        volumeMounts:
        - name: otel-config
          mountPath: "/etc/otelcol"
          readOnly: true
      volumes:
      - name: otel-config
        configMap:
          name: otel-cm
          items:
          - key: "config.yml"
            path: "config.yml"

Prometheusのインストール

PrometheusのHelmChartをインストールします。
書き込み用のエンドポイントを有効化するための起動オプションを設定する必要があります。values.yamlに起動オプションを追加してください。

prometheus-values.yaml

server:
  extraFlags:
    - web.enable-lifecycle
    - web.enable-remote-write-receiver

Grafanaのインストール

GrafanaのHelmChartをインストールします。クラスタ外からGrafanaにアクセスしたいのでvalues.yamlでNodePortを指定しています。他の項目については公式のvalues.yamlを参照してください。

grafana-values.yaml

service:
  type: NodePort
  nodePort: 32300
image:
  repository: grafana/grafana
  tag: 11.4.0
persistence:
  type: pvc
  enabled: true
  storageClassName: <デフォルトのStorageClass名>
  accessModes:
    - ReadWriteOnce
  size: 1Gi

# 環境変数でconfigをオーバーライド
# https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#override-configuration-with-environment-variables
env:
  GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer"
  GF_AUTH_ANONYMOUS_ENABLED: true
  GF_SECURITY_ALLOW_EMBEDDING: true

datasources:
 datasources.yaml:
   apiVersion: 1
   datasources:
    - name: "Prometheus"
      type: prometheus
      access: proxy
      url: http://prometheus-server.monitoring.svc.cluster.local

GitHub Actionsワークフローの実行

何でも良いので、ARCを指定してワークフローを実行してください。実行結果をGrafanaで確認します。

ワークフロー実装例

.github/workflows/linter.yml

name: Lint workflows

on:
  pull_request_target:
    branches:
      - main
    paths:
      - ".github/workflows/*.yml"

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

defaults:
  run:
    shell: bash

permissions:
  contents: read

jobs:
  lint:
    name: Lint workflows
    runs-on: <gha-runner-scale-setのチャート名>
    timeout-minutes: 10

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          persist-credentials: false

      - name: Check workflow files
        uses: docker://rhysd/actionlint:latest
        with:
          args: -color

.github/actionlint.yml

self-hosted-runner:
  labels:
    - <gha-runner-scale-setのチャート名>

Grafanaダッシュボードの作成

gha_job_execution_duration_secondsを選択すると、Time列と秒数が表示されました。秒数の列はラベルに色々情報が入っているようです。また、同じようなレコードが何行も表示されていることが確認できます。

TransformationsタブからJoin by labelsを選択してValueに__name__を設定すると、ラベルにあったデータがテーブルに表示されます。

Group byを追加して、以下の項目にGroup byを設定します。

job_name
job_result
runner_id
gha_job_execution_duration_seconds_sum

RunnerのID毎にグルーピングされ重複行が表示されなくなります。

Rename fields by regexを追加して、gha_job_execution_duration_seconds_sumをdurationにリネームします。

Bar chartに変更して整形すると、実行したワークフローの完了までにかかった時間が表示できました。

まとめ

ARCのmetricsをOpenTelemetry Collectorで収集して、Grafanaのダッシュボードに可視化してみました。
Platform提供者がObservabilityを導入する際に役に立ちそうなmetricsが数多く存在するので、ARCを導入する際はmetricsも保存しておくのがお勧めです。

補足

ARCのGitHub issuesを見ると、一部取れないmetricsがあるようです。

また、こちらのjsonをGrafanaでimportすれば、基本的なパネルは簡単に作成できました。

参考リンク

100

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up