Microsoft FabricAdvent Calendar 2024

イベントハウスを使用したデータパイプラインの結果の保管・分析

Last updated at 2024-12-17Posted at 2024-12-17

はじめに

イベントハウスにパイプラインからアクティビティの出力を記録する方法を記載します。

準備

Microsoft Fabric でパイプラインを使用してデータを取り込むを参考にデータを取り込むパイプラインを作成します。

ノートブックの最後には終了で結果を返すようにしておきます。

pyspark


import json

count = df.count()

# ノートブックの実行結果を返す
result = {
    "df_count" : count,
}
result_str=json.dumps(result)
mssparkutils.notebook.exit(result_str)

実行を確認して、出力値を確認しておきます。これらが KQLに入力したい値です。

手順

イベントハウスを作成します。

KQL データベース上で、テーブルを作成します。

KQL


// Create table command
////////////////////////////////////////////////////////////
.create table ['pipeline']  (['result']:dynamic)

パイプラインに戻って、KQL スクリプトアクティビティを追加します。

エラー時の情報を取得する場合は、以下のようにエラー分岐ごとに KQL を設置するほかないようです。

:
設定で、あてさきのKQLデータベースを選択し、動的なコンテンツを開きます。

動的なコンテンツに以下を入力します。

pipeline式


@concat('
.ingest inline into table pipeline with (format = "json") <|
{"result":{'
,' "workspaceId":"',pipeline().DataFactory,'"'
,',"pipelineId":"',pipeline().Pipeline,'"'
,',"pipelineRunId":"',pipeline().RunId,'"'
,',"pipelineTriggerTime":"',pipeline().TriggerTime,'"'
,',"pipelineEndTime":"',utcNow(),'"'
,',"status":"success"'
,',"activities":['
,' {"name":"<アクティビティ名を入れる>"'
,',"result":',,'}'
,']}}'
)

入力したい出力をもつアクティビティを追加します。
複数入れる場合は、カンマの数などに注意して追加していきます。以下のようになりました。（activities 先頭のオブジェクトだけカンマ無し、後はあり。）
パラメータなどほかの値を入れる場合はこのようにします。
実行し、成功すると、KQL データベース上のテーブルに結果が格納されます。
kql
```
// Use 'take' to view a sample number of records in the table and check the data.
pipeline
| take 100
```
KQL でエラーがある場合、HasErrorsでのみわかる点に注意

:

KQL例

kql



pipeline
| project 
    pipelineTriggerTime = todatetime(result.pipelineTriggerTime)
    ,pipelineRunId = tostring(result.pipelineRunId)
    ,duration =  bin(todatetime(result.pipelineEndTime)-todatetime(result.pipelineTriggerTime),1s) 
    ,status = tostring(result.status)
    ,delete_result = result.activities[0].result 
    ,copy_result = result.activities[1].result 
    ,notebook_result = result.activities[2].result 
    ,url = strcat('https://app.powerbi.com/workloads/data-pipeline/monitoring/workspaces/',result.workspaceId,'/pipelines/',result.pipelineId,'/',result.pipelineRunId)
| order by ingestion_time()desc

delete result

copy_result

notebook_result

KQL を基にダッシュボード化することも可能です。この例では URL を作りこんでいるので、直接監視ハブに遷移可能です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up