0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

BigQuery Hive Partitioning のフィルタリングについて

Posted at

やりたいこと

gs://my-bucket/my_logs/year=2021/month=05/day=08/hour=10/minute=35/hogehoge.parquet
のような BigQuery HivePartitioning を使った external table を作ってクエリをかけるときに、

SELECT 
    * 
FROM 
    `my-project.my_dataset.my_table`
WHERE 
    TIMESTAMP(FORMAT("%04d-%02d-%02dT%02d:%02d:00`, year, month, day, hour, minute)) BETWEEN
        TIMESTAMP(_from) AND TIMESTAMP(_to)
LIMIT 1000

みたいな絞り込みをかけたい

結論

SELECT 
    * 
FROM 
    `my-project.my_dataset.my_table`
WHERE 
    year = 2021
    AND month = 10
    AND day = 3
        -- AND TIMESTAMP(FORMAT("%04d-%02d-%02dT%02d:%02d:00`, year, month, day, hour, minute)) BETWEEN
    --    TIMESTAMP(_from) AND TIMESTAMP(_to)
LIMIT 1000

だと読み込みファイルに制限がかかるけど、コメントアウトの行を外すだけで全走査になって、クエリ読み込みサイズがキーーってなる

解決方法

EXTRACT() を使って、 _from_to の年・月・日の大小で場合分けをして、やろうとしたけど、クエリの量が爆発したので無理でした

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?