概要
count distinct集計にwindowを使うことはできないが、それに対する代替手段を備忘のためまとめる。
やり方
count(distinct X) over (order by Y)
したい時、
-
row_number() over (partition by X order by Y) as rn
で変数群X毎にY順で出現順序をつける -
sum(sum(if(rn = 1, 1, 0))) over (order by y)
で初出の箇所のみで和を集計する
windowでcountdistinct
with
sample_data as (
select
*
from unnest([
struct(date '2019-01-01' as dt, 'user1' as user_id),
(date '2019-01-02', 'user1'),
(date '2019-01-02', 'user2'),
(date '2019-01-03', 'user3'),
(date '2019-01-04', 'user3'),
(date '2019-01-05', 'user3'),
(date '2019-01-02', 'user4'),
(date '2019-01-04', 'user4'),
(date '2019-01-04', 'user5')
])
)
select distinct
dt,
sum(case when rn = 1 then 1 else 0 end) over (order by dt) as accumulated_uu
from (
select
*,
row_number() over (partition by user_id order by dt) as rn
from sample_data
)