LoginSignup
4
0

BigQuery → TensorFlow Dataset

Last updated at Posted at 2023-12-07

本記事は ZOZO Advent Calendar 2023 シリーズ 7 の 8 日目の記事です。

概要

BigQuery のテーブルを TensorFlow Dataset に変換する方法を2つご紹介します。

1. BigQuery リーダーを使用する方法

チュートリアル記載の方法です。

import tensorflow as tf
from tensorflow_io.bigquery import BigQueryClient
from google.cloud import bigquery

PROJECT_ID = "project_id"
DATASET_ID = "dataset_id"
CSV_SCHEMA = [
      bigquery.SchemaField("user_id", "INTEGER"),
      bigquery.SchemaField("item_id", "INTEGER"),
      bigquery.SchemaField("score", "INTEGER"),
]

def read_bigquery(table_name):
    tensorflow_io_bigquery_client = BigQueryClient()
    read_session = tensorflow_io_bigquery_client.read_session(
        "projects/" + PROJECT_ID,
        PROJECT_ID,
        table_name,
        DATASET_ID,
        list(field.name for field in CSV_SCHEMA),
        requested_streams=2,
    )

    dataset = read_session.parallel_read_rows()
    return dataset

train = (
    read_bigquery("table_name")
    .batch(256)
)

2. Cloud Storage を介する方法

以下の方法は Cloud Storage に中間データを保存したい場合に使えます。

1. BigQuery → Cloud Storage

EXPORT DATA
  OPTIONS (
    uri = "gs://bucket_name/dataset/*.csv",
    format = "CSV",
    overwrite = true,
    header = true
  )
AS 
SELECT
  user_id,
  item_id,
  score,
FROM `project_id.dataset_id.table_name`

2. Cloud Storage → TensorFlow Dataset

import tensorflow as tf
train_data = tf.data.experimental.make_csv_dataset(
    file_pattern="gs://bucket_name/dataset/*.csv",
    batch_size=256,
)
4
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
0