目的
以下のように固定長の Array (target_array
) を持つテーブルに対して、id
単位で同じ offset にある要素同士の平均を取りたい。
Row | id | target_array |
---|---|---|
1 | 1 | [1, 2, 3] |
2 | 1 | [2, 3, 4] |
3 | 1 | [3, 4, 5] |
4 | 2 | [4, 5, 6] |
5 | 2 | [5, 6, 7] |
SELECT 1 AS id, [1, 2, 3] AS target_array
UNION ALL
SELECT 1 AS id, [2, 3, 4] AS target_array
UNION ALL
SELECT 1 AS id, [3, 4, 5] AS target_array
UNION ALL
SELECT 2 AS id, [4, 5, 6] AS target_array
UNION ALL
SELECT 2 AS id, [5, 6, 7] AS target_array
望まれる結果
Row | id | averaged_array |
---|---|---|
1 | 1 | [2.0, 3.0, 4.0] |
2 | 2 | [4.5, 5.5, 6.5] |
方法
以下のように、一度 array を offset つきで展開し、id と offset を元に GROUP BY し平均を計算する。そして、再度 array を生成する。
この時、ARRAY_AGG は順序を考慮しないため、offset を元に ORDER BY する必要がある。
WITH unnested_array AS (
SELECT
id,
value,
offset
FROM `PROJECT_ID.DATASET.TABLE_NAME`,
UNNEST(target_array) AS value WITH OFFSET AS offset
)
, averaged_unnested_array AS (
SELECT
id,
AVG(value) AS averaged_value,
offset
FROM unnested_array
GROUP BY id, offset
)
SELECT
id,
ARRAY_AGG(averaged_value ORDER BY offset ASC) AS averaged_array
FROM averaged_unnested_array
GROUP BY id
結果
WITH input_table AS (
SELECT 1 AS id, [1, 2, 3] AS target_array
UNION ALL
SELECT 1 AS id, [2, 3, 4] AS target_array
UNION ALL
SELECT 1 AS id, [3, 4, 5] AS target_array
UNION ALL
SELECT 2 AS id, [4, 5, 6] AS target_array
UNION ALL
SELECT 2 AS id, [5, 6, 7] AS target_array
)
, unnested_array AS (
SELECT
id,
value,
offset
FROM input_table,
UNNEST(target_array) AS value WITH OFFSET AS offset
)
, averaged_unnested_array AS (
SELECT
id,
AVG(value) AS averaged_value,
offset
FROM unnested_array
GROUP BY id, offset
)
SELECT
id,
ARRAY_AGG(averaged_value ORDER BY offset ASC) AS averaged_array
FROM averaged_unnested_array
GROUP BY id
Row | id | averaged_array |
---|---|---|
1 | 1 | [2.0, 3.0, 4.0] |
2 | 2 | [4.5, 5.5, 6.5] |
終わりに
BigQuery で潜在表現の average pooling を行いたい時などに使えるように思う。上記クエリの AVG()
を MAX()
に変更すれば max pooling を行うことができる。
参考