LoginSignup
0
1

More than 1 year has passed since last update.

google-cloud-bigquery の load_table_from_json で 400 Error

Last updated at Posted at 2021-11-22

小一時間悩んだ

環境

  • google-cloud-bigquery 2.30.1 (Python Client for Google BigQuery)
  • Python 3.9.0

現象

下記のようなコードで、BigQuery に insert しようとしたところ

foo.py
import json

from google.cloud import bigquery
from google.cloud.bigquery.job import LoadJob
from google.cloud.bigquery.table import Table

if __name__ == '__main__':


    a = '{"foo":"123","bar": "123"}'
    row = json.loads(a)

    project_id = 'foo_project'
    dataset_id = 'bar_dataset'
    table_id = 'test'

    client  = bigquery.Client(project = project_id)
    dataset  = client.dataset(dataset_id)
    table = dataset.table(table_id)

    table_result: Table = client.get_table(table)

    job_config = bigquery.LoadJobConfig()
    job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
    job_config.schema = table_result.schema
    table_id_constructor = ".".join([project_id,dataset_id,table_id])

    job: LoadJob = client.load_table_from_json(json_rows=row, destination=table_id_constructor, job_config=job_config)

    print(job.result())

下記のようなエラーになる(そっけない...)

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.

原因

あれこれ見たりしたがエラーからはよくわからず、結果 load_table_from_json() の json_rows に渡すのは、list型(配列)じゃないとあかん ということがわかった。

というわけで以下のようにすることで insert できましたとさ。

(JSONLを期待してるだろうから、まぁ、そうなんだろうなという気もするが)

foo2.py
import json
from typing import List

from google.cloud import bigquery
from google.cloud.bigquery.job import LoadJob
from google.cloud.bigquery.table import Table

if __name__ == '__main__':


    a = '{"foo":"123","bar": "123"}'
    row = json.loads(a)
    rows: List[dict] = []
    rows.append(row) # 変更点: list に append してあげる

    project_id = 'foo_project'
    dataset_id = 'bar_dataset'
    table_id = 'test'

    client  = bigquery.Client(project = project_id)
    dataset  = client.dataset(dataset_id)
    table = dataset.table(table_id)

    table_result: Table = client.get_table(table)

    job_config = bigquery.LoadJobConfig()
    job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
    job_config.schema = table_result.schema
    table_id_constructor = ".".join([project_id,dataset_id,table_id])

    # 変更点: row じゃなくて、 rows を渡す
    job: LoadJob = client.load_table_from_json(json_rows=rows, destination=table_id_constructor, job_config=job_config)

    print(job.result())
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1