2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

SQLで連続する日付の両端の日を取得する

Last updated at Posted at 2021-08-09

これは何

SQLテーブルにてユニークな日付の入ったカラムから、連続した日を抽出しその開始と終了の両端の日を取得したい。そのための作業メモ。

方針

  1. 自己結合
  2. Window関数

先に結論

SQL文のわかりやすさで自己結合一択。データ量が多い場合はWindow関数の方がスキャンが1回で済むがそもそも日付連続するデータは有史以来多くないためその恩恵に意味がない。

以後 bigquery 上で実行できる(ユニーク、ソート済みのデータから)サンプルを書く。

インプット

	_date	
1	2021-01-01
2	2021-01-02
3	2021-01-04
4	2021-01-07
5	2021-01-08
6	2021-01-09
7	2021-01-15
8	2021-01-16
9	2021-01-20

方針1

  • pros: 自己結合し d1._date - COUNT(d2._date) で連続する対象を同一グループをするところがテクニカル。そこ以外は素直なSQL文でわかりやすい
  • cons: 自己結合なのでスキャンが2回実行される
WITH d AS (
  SELECT * FROM
    UNNEST(ARRAY<DATE>[
        "2021-01-01", "2021-01-02",
        "2021-01-04",
        "2021-01-07", "2021-01-08", "2021-01-09",
        "2021-01-15", "2021-01-16",
        "2021-01-20"
    ] ) AS _date ),
o1 AS (
  SELECT
    d1._date, d1._date - COUNT(d2._date) AS grp
  FROM d AS d1 INNER JOIN d AS d2 ON
    d1._date >= d2._date
  GROUP BY
    d1._date ),
out1 AS (
  SELECT
    MIN(_date) AS _from, MAX(_date) AS _to
  FROM
    o1
  GROUP BY
    grp )

SELECT * FROM out1

結果

	low	high	
1	2021-01-01	2021-01-02
2	2021-01-04	2021-01-04
3	2021-01-07	2021-01-09
4	2021-01-15	2021-01-16
5	2021-01-20	2021-01-20

方針2

pros: あるのか・・・?SQLパズル力が高められるぐらい(実践には不要)
cons: SQL文が複雑で解読しずらい

WITH d AS (
  SELECT * FROM
    UNNEST(ARRAY<DATE>[
        "2021-01-01", "2021-01-02",
        "2021-01-04",
        "2021-01-07", "2021-01-08", "2021-01-09",
        "2021-01-15", "2021-01-16",
        "2021-01-20"
    ] ) AS _date ),
o1 AS (
  SELECT
    _date,
    LAG(_date) OVER(ORDER BY _date) AS prev_diff,
    LEAD(_date) OVER(ORDER BY _date) AS next_diff,
  FROM d
  ),
o2 AS (
  SELECT
    _date,
    CASE WHEN COALESCE(EXTRACT(DAY FROM _date - prev_diff), 0) <> 1 THEN _date ELSE NULL END AS low,
    CASE WHEN COALESCE(EXTRACT(DAY FROM next_diff - _date), 0) <> 1 THEN _date ELSE NULL END AS high
  FROM o1
),
o3 AS (
    SELECT 
        low,
        CASE 
          WHEN high IS NULL THEN min(high) OVER(ORDER BY _date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
          ELSE high
        END AS high
    FROM o2
),
out2 as (
    SELECT * FROM o3 WHERE low IS NOT NULL
)
SELECT * FROM out2

結果: 同上のため略

参考文献

SQL実践入門 第8章

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?