More than 3 years have passed since last update.

日付付きcsvファイルを取り込み時間パーティションテーブルとしてBigQueryにインポートするpythonスクリプト

Posted at 2020-10-22

背景

xxxx_20200930.csv のように、ファイル名に日付が入っているcsvファイルをパーティションタイム指定してBigQueryにインポートするpythonスクリプトが欲しい。
今回は、大量のcsvファイルがディレクトリ以下に入っているという前提で作成。

サンプルスクリプト

main.py


from google.cloud import bigquery
import json
import glob

client = bigquery.Client()

job_config = bigquery.LoadJobConfig(
    source_format=bigquery.SourceFormat.CSV,
    skip_leading_rows=1,
    autodetect=True,
    allow_quoted_newlines=True,
    time_partitioning=bigquery.TimePartitioning()
)

path = "../some/dir/*"
files = glob.glob(path + '*')

for file_name in files:
    date = file_name.split('_')[-1][0:8]
    table_id = 'dataset.table_name$' + date # パーティション指定

    with open(file_name, "rb") as source_file:
        job = client.load_table_from_file(
            source_file,
            table_id,
            job_config=job_config
    )

    job.result()  # Waits for the job to complete.

    table = client.get_table(table_id)  # Make an API request.
    print(
        "Loaded {} rows and {} columns to {}".format(
            table.num_rows, len(table.schema), table_id
        )
    )

参考

データをローカルデータソースから読み込む

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up