7
5

More than 3 years have passed since last update.

AKS を Azure Monitor (kusto) で監視する方法を Workbooks から学ぶ

Last updated at Posted at 2020-02-20

Intro

コンテナ向けに用意されている Azure Monitor for Containers では、最初から以下4つの Workbooks が提供されている。

  • ディスク容量
  • ディスクIO
  • Kubelet
  • ネットワーク

この記事では、これらの既に出来上がっている Workbooks から、Kustoクエリをひたすら参考にしていきたい。このデフォルトの Workbooks はいつでもどのクラスターでも参照できるので、チートシート的に使うととても便利である。以下に紹介する kusto クエリを隅々まで覚える必要は全くない。

Azure Monitor Workbooksとは

ログクエリ結果やグラフを複数まとめてレポート形式にできるもの。作成したレポートはチーム内で共有できるので、新メンバーがログのクエリ方法がわからない!という場合でも、「ああこのテレメトリを見れば○○がわかるのか」といったように監視に必要な知識も伝えることができる。

例えば以下は、ネットワークのWorkbooks画面。
1. クエリを定義してグラフを見せる。
2. それをレポート化する。
3. 画面上にセレクトボックスがあるように、クエリ条件をパラメータ化したりもできる。
image.png

最低限の Azure Monitor 基礎知識

クエリの書き方 基本

目次

  • 基本的なクエリ
  • スキーマの概要
  • フィルター処理 e.g. | where hogehoge == "hugahuga"
  • 並び替え e.g. | sort
  • グループ化と集計 e.g. | summarize
  • グラフ
  • クエリの保存と読み込み
  • 列の選択と計算 e.g. | project column1, column2, column3
  • 追加の列を定義 e.g. | extend NewColumn1=substring(OriginalColumn1, 0, 5)
  • 時間列でグループ化(ビン分割) e.g. | summarize avg(CounterValue) by bin(TimeGenerated, 1h)

収集データについて

ディスク容量、ディスクIO、ネットワーク、といったテレメトリは、InfluxData Telegraf エージェントによって収集されていて、これらは、InsightMetrics というカスタムメトリックスとしてクエリできる。
(参考:InfluxData Telegraf エージェントによって送信される仕組み https://docs.microsoft.com/ja-jp/azure/azure-monitor/platform/collect-custom-metrics-linux-telegraf

この InsightMetrics ログの Tags プロパティの中に各値が入っていて、NameNameSpace でフィルタすることで各メトリクスを取得できる仕掛けになっている。これについては GitHub の奥深くに情報があったので、引用しておく。それぞれ InfluxData Telegraf ドキュメントへのリンクがついている。

  • Disk metrics
Name Namespace Description
used container.azm.ms/disk more info
free container.azm.ms/disk more info
used_percent container.azm.ms/disk more info
  • Disk IO metrics
Name Namespace Description
reads container.azm.ms/diskio more info
read_bytes container.azm.ms/diskio more info
read_time container.azm.ms/diskio more info
writes container.azm.ms/diskio more info
write_bytes container.azm.ms/diskio more info
write_time container.azm.ms/diskio more info
io_time container.azm.ms/diskio more info
iops_in_progress container.azm.ms/diskio more info
  • Host network metrics
Name Namespace Description
bytes_sent container.azm.ms/net more info
bytes_received container.azm.ms/net more info
err_in container.azm.ms/net more info
err_out container.azm.ms/net more info
  • Kubelet metrics
Name Namespace Description
kubelet_docker_operations container.azm.ms/prometheus Cumulative number of Docker operations by operation type
kubelet_docker_operations_errors container.azm.ms/prometheus Cumulative number of Docker operation errors by operation type

(参考元: https://github.com/microsoft/OMS-docker/blob/vishwa/june19agentrel/docs/InsightsMetrics.md)

他のコンテナーレコードは以下ドキュメントに詳細がある。
https://docs.microsoft.com/ja-jp/azure/azure-monitor/insights/container-insights-log-search

  • ホストとコンテナーのパフォーマンス: Perf
  • コンテナー インベントリ: ContainerInventory
  • コンテナー ログ: ContainerLog
  • コンテナー ノード インベントリ: ContainerNodeInventory
  • Kubernetes クラスター内のポッドのインベントリ: KubePodInventory
  • Kubernetes クラスター内のノード部分のインベントリ: KubeNodeInventory
  • Kubernetes イベント: KubeEvents
  • Kubernetes クラスター内のサービス: KubeServices
  • Kubernetes クラスターのノード部分のパフォーマンス メトリック: Perf | where ObjectName == “K8SNode”
  • Kubernetes クラスターのコンテナー部分のパフォーマンス メトリック: Perf | where ObjectName == “K8SContainer”
  • カスタム メトリック: InsightsMetrics

Prometheus サポート

Azure Monitor for containers では Prometheus サーバー無しで Prometheusメトリックを収集することができる。しかし、残念ながら Workbook にPrometheusメトリクスのクエリが無いので、ここでは設定方法とクエリ方法のドキュメントを紹介するにとどめる。

構成方法
https://docs.microsoft.com/ja-jp/azure/azure-monitor/insights/container-insights-prometheus-integration#query-prometheus-metrics-data
ConfigMap にメトリクス収集するための設定を構成するだけである。

クエリ方法
Prometheus メトリック データのクエリを実行する
InsightsMetrics の名前空間 prometheus をフィルタすると、同様に Tags プロパティの中に JSON でメトリクスが入っている。

InsightsMetrics 
| where Namespace == "prometheus"
| extend tags=parse_json(Tags)
| summarize count() by Name

Workbooks を開く

Azure Kubernetes Service 選択後の左メニュー > 分析情報(Insights) > 右上セレクトボックス View Workbooks
image.png

それでは、各Workbookを順番に見て行こう。

ディスク容量 Workbooks

以下6行はグラフ合計値表示のためのおまじないだと思って読み飛ばしてよい。ディスク容量の全グラフで共通である。

| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);

注目ポイント

  • where Origin == 'container.azm.ms/telegraf'
  • ディスク容量関連は、where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'

Top 3 Disks by Used Disk %

image.png

let selectedStateDisks = dynamic(["*"]);
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let mostUsedDisks = data
| top-nested 3 of NodeDisk by MaxVal = max(Val);
data
| where NodeDisk in (mostUsedDisks)
| make-series ['Used Disk %'] = max(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk

Disk Capacity Overview

image.png
(クエリが多いので割愛)

Used Disk %

image.png

let selectedStateDisks = dynamic(["*"]);
let usedPercent = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let row = dynamic(
{
    "Kind":"Unselected"});
    let worstDiskAcrossNodes = usedPercent
    | summarize UsedPercent = max(Val) by NodeDisk
    | top 1 by UsedPercent desc;
    usedPercent
    | where (row.Kind == 'Unselected') or (row.Kind == 'Node' and row.Id == HostName) or (row.Kind == 'Device' and row.Id == NodeDisk)
    | make-series ['Used Disk %'] = max(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
    | where NodeDisk contains iff(row.Kind == 'Unselected', toscalar(worstDiskAcrossNodes
    | project NodeDisk), '')

Free Disk Space (GiB)

image.png

let selectedStateDisks = dynamic(["*"]);
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent' or Name == 'free'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let usedPercent = data
| where Name == 'used_percent';
let free = data
| where Name == 'free'
| extend Val = Val / 1073741824;
let row = dynamic(
{
    "Kind":"Unselected"});
    let worstDiskAcrossNodes = usedPercent
    | summarize UsedPercent = max(Val) by NodeDisk
    | top 1 by UsedPercent desc;
    free
    | where (row.Kind == 'Unselected') or (row.Kind == 'Node' and row.Id == HostName) or (row.Kind == 'Device' and row.Id == NodeDisk)
    | make-series ['Free Disk Space'] = min(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
    | where NodeDisk contains iff(row.Kind == 'Unselected', toscalar(worstDiskAcrossNodes
    | project NodeDisk), '')

ディスクIO Workbooks

以下7行はグラフ合計値表示のためのおまじないだと思って読み飛ばしてよい。ディスクIOの全グラフで共通である。

| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize

注目ポイント

  • where Origin == 'container.azm.ms/telegraf'
  • ディスクIO関連は、where Namespace == 'container.azm.ms/diskio'

Disk IO Overview

image.png
(クエリが多いので割愛)

Read Bytes/sec

image.png

let bytesReadPerSec = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), iif(PrevVal == Val, 0.0, (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesReadPerSec
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

Write Bytes/sec

image.png

let bytesWritePerSec = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), iif(PrevVal == Val, 0.0, (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesWritePerSec
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

Total Bytes Read (10m intervals)

image.png

let bytesReadTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, iif(PrevVal == Val, 0.0, (Val - PrevVal) / 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let sum = bytesReadTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device;
sum

Total Bytes Written (10m intervals)

image.png

let bytesWrittenTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, iif(PrevVal == Val, 0.0, (Val - PrevVal) / 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let sum = bytesWrittenTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device;
sum

Milliseconds Per Bytes Read

image.png

let msPerByteRead = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, pow(Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1), pow((Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
msPerByteRead
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

Milliseconds Per Bytes Written

image.png

let msPerByteWritten = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(TimeGenerated == PrevTimeGenerated or (Val - PrevVal) == 0, 0.0, iif(PrevVal > Val, pow(Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1), pow((Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
msPerByteWritten
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

IOPS In Progress

image.png

let iops = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'iops_in_progress'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| project TimeGenerated, HostName, Device, Val;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
iops
| make-series Val = iif(avgOn != -1, avg(Val), iif(maxOn != -1, max(Val), min(Val))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

% Disk Busy

image.png

let ioTime = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'io_time'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000)) * 100
| where isnotnull(Rate)
| project TimeGenerated, NodeDisk, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
ioTime
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
| extend Name = NodeDisk
| project-away NodeDisk

Kubelet Workbooks

注目ポイント

  • where Origin == 'container.azm.ms/telegraf'
  • Kubelet 関連は、where Namespace == 'container.azm.ms/prometheus'

Overview By Node

image.png

let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/prometheus'
| where Name == 'kubelet_docker_operations' or Name == 'kubelet_docker_operations_errors'
| extend Tags = todynamic(Tags)
| extend OperationType = tostring(Tags['operation_type']), HostName = tostring(Tags.hostName)
| where '*' in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2') or HostName in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2')
| where '*' in ('*') or OperationType in ('*')
| extend partitionKey = strcat(HostName, '/' , Name, '/', OperationType)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, Name, HostName, Rate;
let operationData = data
| where Name == 'kubelet_docker_operations';
let totalOperationsByNode = operationData
| summarize Rate = sum(Rate) by HostName
| project HostName, TotalOperations = Rate;
let totalOperationsByNodeSeries = operationData
| make-series TotalOperationsSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| project-away TimeGenerated;
let errorData = data
| where Name == 'kubelet_docker_operations_errors';
let totalErrorsByNode = errorData
| summarize Rate = sum(Rate) by HostName
| project HostName, TotalErrors = Rate;
let totalErrorsByNodeSeries = errorData
| make-series TotalErrorsSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| project-away TimeGenerated;
totalOperationsByNode
| join kind=inner
(
    totalErrorsByNode
)
on HostName
| join kind = inner
(
    totalOperationsByNodeSeries
)
on HostName
| join kind = inner
(
    totalErrorsByNodeSeries
)
on HostName
| project-away HostName1, HostName2, HostName3
| extend TotalSuccessfulOperationsSeries = series_subtract(TotalOperationsSeries, TotalErrorsSeries)
| extend SuccessPercentage = round(iif(TotalOperations == 0, 1.0, 1 - (TotalErrors / TotalOperations)), 4), SuccessPercentageSeries = series_divide(TotalSuccessfulOperationsSeries, TotalOperationsSeries)
| extend SeriesOfEqualLength = range(1, array_length(TotalOperationsSeries), 1)
| extend SeriesOfOneHundo = series_multiply(series_divide(SeriesOfEqualLength, SeriesOfEqualLength), 100)
| extend SuccessfulOperationsEqualsTotalOperationsSeries = series_equals(TotalSuccessfulOperationsSeries, TotalOperationsSeries)
| extend SuccessPercentageSeries = array_iff(SuccessfulOperationsEqualsTotalOperationsSeries, SeriesOfOneHundo, SuccessPercentageSeries)
| project HostName, TotalOperations, TotalErrors, SuccessPercentage, SuccessPercentageSeries
| order by SuccessPercentage asc, HostName asc
| project-rename Node = HostName, ['Total Operations'] = TotalOperations, ['Total Errors'] = TotalErrors, ['Success %'] = SuccessPercentage, ['Success % Trend'] = SuccessPercentageSeries

Overview By Operation Type

image.png

let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/prometheus'
| where Name == 'kubelet_docker_operations' or Name == 'kubelet_docker_operations_errors'
| extend Tags = todynamic(Tags)
| extend OperationType = tostring(Tags['operation_type']), HostName = tostring(Tags.hostName)
| where '*' in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2') or HostName in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2')
| where '*' in ('*') or OperationType in ('*')
| extend partitionKey = strcat(HostName, '/' , Name, '/', OperationType)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, Name, OperationType, Rate;
let operationData = data
| where Name == 'kubelet_docker_operations';
let totalOperationsByType = operationData
| summarize Rate = sum(Rate) by OperationType
| project OperationType, TotalOperations = Rate;
let totalOperationsByTypeSeries = operationData
| make-series TotalOperationsByTypeSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by OperationType
| project-away TimeGenerated;
let errorsData = data
| where Name == 'kubelet_docker_operations_errors';
let totalErrorsByType = errorsData
| summarize Rate = sum(Rate) by OperationType
| project OperationType, TotalErrors = Rate;
let totalErrorsByTypeSeries = errorsData
| make-series TotalErrorsByTypeSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by OperationType
| project-away TimeGenerated;
let seriesLength = toscalar(   totalErrorsByTypeSeries
| extend ArrayLength = array_length(TotalErrorsByTypeSeries)
| summarize Array_Length = max(ArrayLength)  );
totalOperationsByType
| join kind=leftouter
(
    totalErrorsByType
)
on OperationType
| project-away OperationType1
| extend TotalErrors = iif(isempty(TotalErrors), 0.0, TotalErrors)
| join kind=leftouter
(
    totalErrorsByTypeSeries
)
on OperationType
| project-away OperationType1
| extend SeriesOfEqualLength = range(1, seriesLength, 1)
| extend SeriesOfZeroes = series_subtract(SeriesOfEqualLength, SeriesOfEqualLength)
| extend SeriesOfOneHundo = series_multiply(series_divide(SeriesOfEqualLength, SeriesOfEqualLength), 100)
| extend TotalErrorsByTypeSeries = iif(isempty(TotalErrorsByTypeSeries), SeriesOfZeroes, TotalErrorsByTypeSeries)
| join kind=leftouter
(
    totalOperationsByTypeSeries
)
on OperationType
| project-away OperationType1
| extend TotalSuccessfulOperationsByTypeSeries = series_subtract(TotalOperationsByTypeSeries, TotalErrorsByTypeSeries)
| extend SuccessPercentage = round(iif(TotalOperations == 0, 1.0, 1 - (TotalErrors / TotalOperations)), 4), SuccessPercentageSeries = series_divide(TotalSuccessfulOperationsByTypeSeries, TotalOperationsByTypeSeries)
| extend SuccessfulOperationsEqualsTotalOperationsSeries = series_equals(TotalSuccessfulOperationsByTypeSeries, TotalOperationsByTypeSeries)
| extend SuccessPercentageSeries = array_iff(SuccessfulOperationsEqualsTotalOperationsSeries, SeriesOfOneHundo, SuccessPercentageSeries)
| project OperationType, TotalOperations, TotalErrors, SuccessPercentage, SuccessPercentageSeries
| order by SuccessPercentage asc, OperationType asc
| project-rename ['Operation Type'] = OperationType, ['Total Operations'] = TotalOperations, ['Total Errors'] = TotalErrors, ['Success %'] = SuccessPercentage, ['Success % Trend'] = SuccessPercentageSeries

ネットワーク Workbooks

以下7行はグラフ合計値表示のためのおまじないだと思って読み飛ばしてよい。ネットワークの全グラフで共通である。

| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize

注目ポイント

  • where Origin == 'container.azm.ms/telegraf'
  • ネットワーク関連は、where Namespace == 'container.azm.ms/net'

Network Overview

image.png

(クエリが多いので割愛)

Sent Bytes/sec

image.png

let bytesSentPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_sent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesSentPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

Received Bytes/sec

image.png

let bytesReceivedPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_recv'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesReceivedPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

Total Bytes Sent (by 10m intervals)

image.png

let bytesSentTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_sent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, (Val - PrevVal) / 1)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
bytesSentTotal
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface

Total Bytes Received (by 10m intervals)

image.png

let bytesReceivedTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_recv'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, (Val - PrevVal) / 1)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Rate;
let sum = bytesReceivedTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| extend Name = strcat(HostName, ':', 'Sum')
| project-away HostName;
sum

Errors Out/sec

※キャプチャ取った際に0件だったので画面は省略

let errorsOutPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_out'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / datetime_diff('Second', TimeGenerated, PrevTimeGenerated), (Val - PrevVal) / datetime_diff('Second', TimeGenerated, PrevTimeGenerated))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
errorsOutPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

Errors In/sec

※キャプチャ取った際に0件だったので画面は省略

let errorsInPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_in'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / datetime_diff('Second', TimeGenerated, PrevTimeGenerated), (Val - PrevVal) / datetime_diff('Second', TimeGenerated, PrevTimeGenerated))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
errorsInPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

Total Errors Out (by 10m intervals)

※キャプチャ取った際に0件だったので画面は省略

let totalErrorsOut = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_out'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
totalErrorsOut
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface

Total Errors In (by 10m intervals)

let totalErrorsIn = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_in'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
totalErrorsIn
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface

参照情報まとめ

7
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
5