目的
以下のように固定長の Array (target_array) を持つテーブルに対して、id 単位で同じ offset にある要素同士の平均を取りたい。
| Row | id | target_array |
|---|---|---|
| 1 | 1 | [1, 2, 3] |
| 2 | 1 | [2, 3, 4] |
| 3 | 1 | [3, 4, 5] |
| 4 | 2 | [4, 5, 6] |
| 5 | 2 | [5, 6, 7] |
SELECT 1 AS id, [1, 2, 3] AS target_array
UNION ALL
SELECT 1 AS id, [2, 3, 4] AS target_array
UNION ALL
SELECT 1 AS id, [3, 4, 5] AS target_array
UNION ALL
SELECT 2 AS id, [4, 5, 6] AS target_array
UNION ALL
SELECT 2 AS id, [5, 6, 7] AS target_array
望まれる結果
| Row | id | averaged_array |
|---|---|---|
| 1 | 1 | [2.0, 3.0, 4.0] |
| 2 | 2 | [4.5, 5.5, 6.5] |
方法
以下のように、一度 array を offset つきで展開し、id と offset を元に GROUP BY し平均を計算する。そして、再度 array を生成する。
この時、ARRAY_AGG は順序を考慮しないため、offset を元に ORDER BY する必要がある。
WITH unnested_array AS (
SELECT
id,
value,
offset
FROM `PROJECT_ID.DATASET.TABLE_NAME`,
UNNEST(target_array) AS value WITH OFFSET AS offset
)
, averaged_unnested_array AS (
SELECT
id,
AVG(value) AS averaged_value,
offset
FROM unnested_array
GROUP BY id, offset
)
SELECT
id,
ARRAY_AGG(averaged_value ORDER BY offset ASC) AS averaged_array
FROM averaged_unnested_array
GROUP BY id
結果
WITH input_table AS (
SELECT 1 AS id, [1, 2, 3] AS target_array
UNION ALL
SELECT 1 AS id, [2, 3, 4] AS target_array
UNION ALL
SELECT 1 AS id, [3, 4, 5] AS target_array
UNION ALL
SELECT 2 AS id, [4, 5, 6] AS target_array
UNION ALL
SELECT 2 AS id, [5, 6, 7] AS target_array
)
, unnested_array AS (
SELECT
id,
value,
offset
FROM input_table,
UNNEST(target_array) AS value WITH OFFSET AS offset
)
, averaged_unnested_array AS (
SELECT
id,
AVG(value) AS averaged_value,
offset
FROM unnested_array
GROUP BY id, offset
)
SELECT
id,
ARRAY_AGG(averaged_value ORDER BY offset ASC) AS averaged_array
FROM averaged_unnested_array
GROUP BY id
| Row | id | averaged_array |
|---|---|---|
| 1 | 1 | [2.0, 3.0, 4.0] |
| 2 | 2 | [4.5, 5.5, 6.5] |
終わりに
BigQuery で潜在表現の average pooling を行いたい時などに使えるように思う。上記クエリの AVG() を MAX() に変更すれば max pooling を行うことができる。
参考