More than 3 years have passed since last update.

Power Query workout - Item access

Last updated at 2022-05-19Posted at 2022-05-14

テーブルやリストなど構造化された値は{}演算子により、アイテムにアクセスすることができる。ここまでは難しくないので触れない。
では、

Item access does not force the evaluation of list or table items other than the one being accessed.
No items in x other than that at position y is evaluated during the process of item selection. (For streaming lists or tables, the items or rows preceding that at position y are skipped over, which may cause their evaluation, depending on the source of the list or table.)

これをどのように理解するか。

Item access では、アクセスされているものを除き、リストまたはテーブルの項目の評価は強制されません。
位置 y にあるもの以外の x 内の項目は、項目の選択の処理中に評価されません (ストリーミングリストまたはテーブルの場合、位置 y より前にある項目または行はスキップされます。これにより、リストまたはテーブルのソースに応じて評価が発生する可能性があります)。

動作

Power Query

{ value_1, value_2, value_3, value_4, ... value_n }{2} = value_3    // true

先頭から3番目のリストアイテムにアクセスするとき、

value_1, value_2 の評価をしない。リストアイテムとして存在していることだけでこと足りるから
なので value_3 が評価できない値(error)にならない限り List_1 の評価結果は error にならない
先頭から4番目以降のリストアイテムは無視される
先頭から3番目のリストアイテム存在しないとき、リストの評価結果はExpression.Error

Power Query

{ 1, 2, 3, 4, 5 }{2} = 3    // true

先頭から3番目のリストアイテムにアクセスするとき、

3番目のリストアイテムである式以外は評価されない

Power Query

List.Generate(()=>1, each _ <=5, each _ + 1){2} = 3    // true

先頭から3番目のリストアイテムにアクセスするとき、

1~2番目のリストアイテムになる式はリストを出力するための評価はされる(ストリーミング)
1~2番目のリストアイテムである式は Item access による評価はされない
4番目以降のリストアイテムは無視される

確認

Query1 : ストリーミング (Power Query)

List.Numbers(1, 1000000, 0)

Query1 : 非ストリーミング (Power Query)

{1, 1, 1, 1, 1, 1, 1, 1 ..........}    // 1,000,000 アイテム

TestA (Power Query)

let
    Source = Table.FromColumns(
        {List.Repeat({0}, Rows)},
        type table [Column1 = Int64.Type]
    ),
    AddedColumn2 = Table.AddColumn(
        Source,
        "Column2",
        each Query1{[Column1]},
        type number
    )
in
    AddedColumn2

TestB (Power Query)

let
    Source = Table.FromColumns(
        {List.Repeat({999999}, Rows)},
        type table [Column1 = Int64.Type]
    ),
    AddedColumn2 = Table.AddColumn(
        Source,
        "Column2",
        each Query1{[Column1]},
        type number
    )
in
    AddedColumn2

ストリーミングされるリストではリストアイテムの位置によってパフォーマンスは変わってしまう。先頭であるほど速やかで後方にあるほど遅くなる。ストリーミングされないリストではリストアイテムの位置によるパフォーマンスの違いはほぼない。

思ったこと🙄

Item access は多くの場合でシーケンシャルアクセスになるはず。結果、過剰なファイルアクセスなどデータソースに対して必要以上のアクセスが発生することがある。要はクエリの評価が遅くなるかもだから、都度適切な手段を考えておく。

その他

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up