BigQuery のメタデータ収集

Last updated at 2025-03-14Posted at 2022-04-27

BigQueryのメタデータを収集するためによく使うものを備忘録としてまとめました。

テーブルの作成日時や件数

SELECT
  project_id
  , dataset_id
  , table_id
  , FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MILLIS(creation_time), 'Asia/Tokyo') as creation_time
  , FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MILLIS(last_modified_time), 'Asia/Tokyo') as last_modified_time
  , row_count
  , size_bytes
FROM
  <dataset_id>.__TABLES__
ORDER BY
  table_id


--  +-------------+-------------+----------+---------------------+---------------------+-----------+------------+
--  | project_id  | dataset_id  | table_id |  creation_time      | last_modified_time  | row_count | size_bytes |
--  +-------------+-------------+----------+---------------------+---------------------+-----------+------------+
--  | example_pj  | example_ds  | table_a  | 2022-01-01 10:00:00 | 2022-01-01 10:00:00 |  100      | 2000       |
--  +-------------+-------------+----------+---------------------+---------------------+-----------+------------+
--  | example_pj  | example_ds  | table_b  | 2022-01-01 11:00:00 | 2022-01-01 11:00:00 |  500      | 30000       |
--  +-------------+-------------+----------+---------------------+---------------------+-----------+------------+

テーブルのカラム情報取得

SELECT
  *
FROM
  <dataset_id>.INFORMATION_SCHEMA.COLUMNS
WHERE
  table_name = '<table_name>'
ORDER BY
  ordinal_position


--  +---------------+--------------+------------+-------------+------------------+-----------
--  | table_catalog | table_schema | table_name | column_name | ordinal_position | ・・・
--  +---------------+--------------+------------+-------------+------------------+-----------
--  | example_pj    | example_ds   | table_a    | column_a    | 1                | ・・・
--  +---------------+--------------+------------+-------------+------------------+-----------
--  | example_pj    | example_ds   | table_a    | column_b    | 2                | ・・・
--  +---------------+--------------+------------+-------------+------------------+-----------

テーブルのカラム情報をカンマ区切りでつなげて取得（クエリでカラム名を列挙する場合に使用）

SELECT
  STRING_AGG(column_name, ", " ORDER BY ordinal_position) as columns
  , CONCAT("'", STRING_AGG(column_name, "', '" ORDER BY ordinal_position), "'") as columns_with_singlequote
FROM
  <dataset_id>.INFORMATION_SCHEMA.COLUMNS
WHERE
  table_name = '<table_name>'


--  +-----------------------------------+------------------------------------+
--  | columns                           | columns_with_singlequote           |
--  +-----------------------------------+------------------------------------+
--  | column_a, column_b, column_c      | 'column_a', 'column_b', 'column_c' | 
--  +-----------------------------------+------------------------------------+

テーブルのカラム情報詳細（ネスト列や説明等）の取得

SELECT
  *
FROM
  <dataset_id>.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE
  table_name = '<table_name>'


--  +---------------+--------------+------------+-------------+----------------+-----------
--  | table_catalog | table_schema | table_name | column_name | field_path     | ・・・
--  +---------------+--------------+------------+-------------+----------------+-----------
--  | example_pj    | example_ds   | table_a    | column_a    | column_a                  | ・・・
--  +---------------+--------------+------------+-------------+----------------+-----------
--  | example_pj    | example_ds   | table_a    | column_b    | column_b.id    | ・・・
--  +---------------+--------------+------------+-------------+----------------+-----------
--  | example_pj    | example_ds   | table_a    | column_b    | column_b.name  | ・・・
--  +---------------+--------------+------------+-------------+----------------+-----------

テーブルの CREATE TABLE 文取得

SELECT
  table_name
  , ddl
FROM
  <dataset_id>.INFORMATION_SCHEMA.TABLES
WHERE
  table_name = '<table_name>'

データセット一覧の取得

-- 東京リージョンのデータセットの場合
SELECT 
  schema_name
FROM
  `<プロジェクトID>`.`region-asia-northeast1`.INFORMATION_SCHEMA.SCHEMATA
ORDER BY 1;

-- USリージョンのデータセットの場合
SELECT 
  schema_name
FROM
  `<プロジェクトID>`.INFORMATION_SCHEMA.SCHEMATA
ORDER BY 1;

特定のデータセット配下のテーブルが参照されているかの確認

SELECT
  *
FROM
  `region-asia-northeast1`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  DATE(creation_time) BETWEEN DATE_ADD(CURRENT_DATE('Asia/Tokyo'), INTERVAL -60 DAY )
  AND CURRENT_DATE('Asia/Tokyo')
  AND REGEXP_CONTAINS(query, r'`<プロジェクトID>.<データセット>.*`')
 ORDER BY creation_time DESC

特定テーブルを参照したクエリジョブのスキャンデータ量・スロット使用時間・実行時間の統計情報の取得

select
  -- 課金バイト数（GiB）
  min(total_bytes_billed / 1024 / 1024 / 1024) as min_total_gib_billed
  , max(total_bytes_billed / 1024 / 1024 / 1024) as max_total_gib_billed
  , avg(total_bytes_billed / 1024 / 1024 / 1024) as avg_total_gib_billed
  , approx_quantiles(total_bytes_billed / 1024 / 1024 / 1024, 100)[offset(90)] as percentile_90_total_gib_billed

  -- スロット（ミリ秒）
  , min(total_slot_ms) as min_total_slot_ms
  , max(total_slot_ms) as max_total_slot_ms
  , avg(total_slot_ms) as avg_total_slot_ms
  , approx_quantiles(total_slot_ms, 100)[offset(90)] as percentile_90_total_slot_ms

  -- 処理時間（ミリ秒）
  , min(timestamp_diff(end_time, start_time, millisecond)) as min_job_duration_ms
  , max(timestamp_diff(end_time, start_time, millisecond)) as max_job_duration_ms
  , avg(timestamp_diff(end_time, start_time, millisecond)) as avg_job_duration_ms
  , approx_quantiles(timestamp_diff(end_time, start_time, millisecond), 100)[offset(90)] as percentile_90_job_duration_ms

from
  `region-asia-northeast1`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  , unnest(referenced_tables) as referenced_table
where
  datetime(creation_time, 'Asia/Tokyo') >= date_add(current_date('Asia/Tokyo'), interval -1 day)
  and referenced_table.table_id = 'sample_table'

この記事は以下の情報を参考にして執筆しました。
https://cloud.google.com/bigquery/docs/information-schema-tables?hl=ja
https://www.niandc.co.jp/tech/20200923_1893/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

BigQuery の メタデータ収集

テーブルの作成日時や件数

テーブルのカラム情報取得

テーブルのカラム情報をカンマ区切りでつなげて取得（クエリでカラム名を列挙する場合に使用）

テーブルのカラム情報詳細（ネスト列や説明 等）の取得

テーブルの CREATE TABLE 文取得

データセット一覧の取得

特定のデータセット配下のテーブルが参照されているかの確認

特定テーブルを参照したクエリジョブのスキャンデータ量・スロット使用時間・実行時間の統計情報の取得

BigQuery のメタデータ収集

テーブルのカラム情報詳細（ネスト列や説明等）の取得