More than 3 years have passed since last update.

Power Query workout - Web.Contents

Last updated at 2022-05-24Posted at 2022-05-24

Web.Contents 関数(Power Query) はさほど複雑ではないし、できることは限られている。どちらかというとアクセス先の仕様などデータソースについてよく知る必要がある。

Web.Contents 関数

Web.Contents(url as text, optional options as nullable record) as binary

The HTTP request is made as either a GET (when no Content is specified) or a POST (when there is Content). POST requests may only be made anonymously.
HTTP 要求は、GET (Content が指定されていない場合) または POST (Content がある場合) のいずれかで行います。 POST 要求は匿名でのみ実行できます。
The headers of the HTTP response are available as metadata on the binary result. Outside of a custom data connector context, only a subset of response headers is available (for security reasons).
HTTP 応答のヘッダーは、バイナリ結果のメタデータとして使用できます。 カスタムデータコネクタコンテキストの外部では、(セキュリティ上の理由から) 応答ヘッダーのサブセットのみを使用できます。

url パラメータ
urlは参照する Web 上のリソースを指し示す文字列。もうひとつ重要な意味があって、URL クエリパラメータを除いた文字列でデータソースに紐づく資格情報を特定する。
options パラメータ
options のフィールドにはいくつかあるけれども、特に重要なのは 3 つ

フィールド
Headers	この値をレコードとして指定すると、HTTP 要求に追加のヘッダーが指定されます。
RelativePath	この値をテキストとして指定すると、要求を行う前にこれがベース URL に追加されます。
Query	クエリパラメーターをプログラムで URL に追加します。エスケープについて考える必要はありません。

あと使うとしたら ManualStatusHandling かな。
これらを理解しておかないと、Power BI データセットの更新や Power BI データフローを定義できないということがある

データソースから得られたデータを用いてデータソースを参照するとか。クエリ(Power Query)評価されて初めてデータソースが特定できる状態は "動的データソース"と判断されクエリは評価は失敗する。デスクトップ環境は寛大に動作してしまうけれども、サービス上では厳格な動作をしている。

SharePoint REST サービスで理解する

SharePoint.Table 関数でこと足りるけれども、コネクタで実現していることをWeb.Contents 関数に置き換え試して理解する。

エンドポイント

Web.Contents 関数 Url パラメータには SharePoint REST サービスのエンドポイントを定義。この url 文字列がデータソースを特定し資格情報を得る。

Power Query

// EndpointUrl
"https://{site_url}/_api/web"
    meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]

SharePoint リストの列情報を取得

Title(表示列名) / InternalName / EntityPropertyName は必要な情報なので。
RelativePath / Query の定義は必須ではないが、使用して記述することはできる。

Power Query

// SPListTitle
"{list_title}"
     meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]

Power Query

let
    Source = Web.Contents(
        EndpointUrl,
        [
            Headers = [
                Accept = "application/json;odata=verbose"
            ],
            RelativePath = "lists/getbytitle(@SPListTitle)/fields",
            Query = [
                #"@SPListTitle" = "'" & SPListTitle & "'"
            ]
        ]
    ),
    Step = Table.FromRecords(
        Json.Document(Source, TextEncoding.Utf8)[d][results],
        type table [
            Title = text, InternalName = text, EntityPropertyName = text, 
            TypeDisplayName = text, Required = logical, Indexed = logical, 
            AutoIndexed = logical, Hidden = logical, Sortable = logical
        ]
    )
in
    Step

SharePoint リストアイテムを取得

SharePoint リストアイテムの取得にはページサイス 100 アイテム(既定値:100 / '$top`パラメータで最大 5,000)を繰り返すリクエストが必要になる。SharePoint.Table 関数ではこの処理が行われるからおまかせでいい。ここでは動的データソースを明らかにしたいだけ。

GET https://{site_url}/_api/web/lists/GetByTitle('Test')/items
Authorization: "Bearer " + accessToken
Accept: "application/json;odata=verbose"

100 アイテム以上のSharePoint リストから得られるレスポンスには __nextフィールド(次のページを取得する url)が用意される。で、最終ページには__nextフィールドは用意されない。

{
    "d": {
        "results":[
            ...
        ],
        "__next": "https://{site_url}/_api/web/lists/getbytitle(%27List1%27)/items?%24skiptoken=Paged%3dTRUE%26p_ID%3d1&%24top=100"
    }
}

__nextの値は Web.Contents 関数による評価によって初めて得られる値である。この URL 文字列をWeb.Contents関数に用いると動的データソースと判定される。たとえ SharePoint REST サービスからのレスポンスであってもこれは覆らない。関数やクエリが評価される以前にデータソースは特定できることが重要なのだ。

次のクエリ(Power Qury)は 動的テータソース と判定されるが、Desktop 環境(Power BI Desktop や Excel)では期待通りに動作するはずだ。ただし、サービス上での Power BI データセットの更新や Power BI データフローでは動作させることができない。

Power Query

// SPListTitle
"{list_title}"
     meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]

Power Query

let
    Source = List.Generate(
        ()=> Json.Document(
            Web.Contents(
                EndpointURL,
                [
                    Headers = [
                        Accept = "application/json;odata=verbose"
                    ],
                    RelativePath = "lists/getbytitle(@listname)/items",
                    Query = [
                        #"@listname" = "'" & SPListTitle & "'"
                    ]
                ]
            ),
            TextEncoding.Utf8
        )[d],
        each [results]? <> null,
        each 
            if [__next]? <> null
            then
                Json.Document(
                    Web.Contents(
                        [__next],
                        [
                            Headers = [
                                Accept = "application/json;odata=verbose"
                            ]
                        ]
                    ),
                    TextEncoding.Utf8
                )[d]
            else [],
        each [results]
    ),
    TableFromRecords = Table.FromRecords(List.Combine(Source))
in
    TableFromRecords

"Some data sources may not be listed because of hand-authored queries."
"手動で作成したクエリのため、一部のデータソースが一覧に表示されない可能性があります。"
わかりにくい表現になっているのだけど、クエリが評価される以前にデータソースを特定できない状態だってこと。

では、どうするか。
繰り返すリクエストについても、評価以前に特定できるデータソースの利用を維持する。このクエリはサービス上であっても期待通りに評価される。
__nextの値をそのまま Web.Contents 関数で使用するのではなく、Uri.Part 関数でURL クエリパラメータ部分を取得して利用するのがポイント。

Power Query

let
    Source = List.Generate(
        ()=> Json.Document(
            Web.Contents(
                EndpointURL,
                [
                    Headers = [
                        Accept = "application/json;odata=verbose"
                    ],
                    RelativePath = "lists/getbytitle(@listname)/items",
                    Query = [
                        #"@listname" = "'" & SPListTitle & "'"
                    ]
                ]
            ),
            TextEncoding.Utf8
        )[d],
        each [results]? <> null,
        each 
            if [__next]? <> null
            then
                Json.Document(
                    Web.Contents(
                        EndpointURL,
                        [
                            Headers = [
                                Accept = "application/json;odata=verbose"
                            ],
                            RelativePath = "lists/getbytitle(@listname)/items",
                            Query = Uri.Parts([__next])[Query]
                        ]
                    ),
                    TextEncoding.Utf8
                )[d]
            else [],
        each [results]
    ),
    TableFromRecords = Table.FromRecords(List.Combine(Source))
in
    TableFromRecords

思ったこと🙄

Web.Contents 関数で様々な Web リソースを利用することができる。様々なデータソースであるから、それぞれについてもよく理解しておく必要がある。Power Query の勉強よりデータソースの勉強の方が時間を消費するのは当たり前だと思う。
Fiddler でどのようなやり取りが行われるのかをよーく観察するとよいのでは🤪

その他

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up