More than 3 years have passed since last update.

TerraformでBigQueryのdataset・tableを作成する

Last updated at 2022-01-09Posted at 2022-01-09

はじめに

Terraformでリソース管理ができると便利ですよね。
今回は、そんなTerraformを使ってBigQeuryのdatasetとtableを作成してみます。

事前準備

Terraformについてはdocker-compose + Makefileを使って環境を作成します。

また、BigQueryについて操作可能なGCPのサービスアカウントについては既に作成ずみ & credentialを発行済みである前提で進めます。

やってみる

構成

ディレクトリの構成は以下のようになっています。
.tfファイル群は./terraform以下に配置していくことにします。

.
├── .env
├── Makefile
├── docker-compose.yaml
└── terraform
    ├── bigquery.tf
    ├── credentials
    │   └── credentials.json
    └── main.tf

構築準備

credentialについては、./terraform/credentials/credentials.jsonに配置し、.envで環境変数に格納します。

.env

GOOGLE_APPLICATION_CREDENTIALS="credentials/credentials.json"`

docker-compose.yamlでは、Terraformの公式イメージの使用と環境変数の読み込みを記載します。

docker-compose.yaml

version: '3'

services:
    terraform:
        image: hashicorp/terraform:1.0.0
        container_name: terraform
        volumes:
            - ./terraform:/terraform
        env_file: .env
        working_dir: /terraform

Makefileは以下。基本的なTerraformの操作を呼び出せるようにしておきます。

.PHONY: init plan apply destroy
ARG="default"

init:
	@docker-compose run --rm terraform init

plan:
	@docker-compose run --rm terraform plan

apply:
	@docker-compose run --rm terraform apply

destroy:
	@docker-compose run --rm terraform destroy

terraform

.tfファイルについて記述していきます。
まずはmain.tf
GCPのリソースを作成するための設定を記述します。

main.tf

terraform {
  required_providers {
    google = {
      version = "~> 4.0.0"
    }
  }
}

provider "google" {
  project     = "<your project id>"
  region      = "<your region>"
}

※ <your project id>と<your region>については適宜読み替えて下さい。

次に、bigquery.tf

bigquery.tf

resource "google_bigquery_dataset" "dataset" {
  dataset_id                  = "example_dataset"
  friendly_name               = "test"
  description                 = "This is a test description"
  location                    = "<your location>"
}

resource "google_bigquery_table" "users" {
  dataset_id          = google_bigquery_dataset.dataset.dataset_id
  table_id            = "users"
  deletion_protection = false
  clustering          = ["user_id"]

  time_partitioning {
    field                    = "dateday"
    type                     = "DAY"
    require_partition_filter = true
  }

  schema = <<EOF
[
  {
    "name": "user_id",
    "type": "INT64",
    "mode": "REQUIRED",
    "description": "user id"
  },
  {
    "name": "name",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "user name"
  },
  {
    "name": "dateday",
    "type": "DATE",
    "mode": "REQUIRED",
    "description": "created date"
  }
]
EOF

}

※ <your location>については適宜読み替えて下さい。

datasetとtableについては以下のイメージです。

dataset

ここはTerraform公式のサンプル通りで、example_datasetという名前のdatasetを作成します。

table

schemaに記載通り、user_id、name、datedayというカラムを持つusersという名前のテーブルを作成します。

また、datedayカラムで作成日でのパーティショニング、user_idカラムでクラスタリングを設定しています。
加えてtime_partitioningの部分で、パーティショニングなしでのクエリを防ぐためにrequire_partition_filter = trueを設定しています。

パーティショニングとクラスタリングについては、以下の記事で検証されている方がいたので参考になるかと思います！

init・plan・apply

ここまででリソース作成の準備が整ったので、実際に適用していきます。

まずはinitをしましょう。

$ make init

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/google v4.0.0

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

続いてplanで差分を確認します。

$ make plan

Creating terraform_terraform_run ... done
google_bigquery_dataset.dataset: Refreshing state... [id=projects/<your project id>/datasets/example_dataset]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_bigquery_dataset.dataset will be created
  + resource "google_bigquery_dataset" "dataset" {
      + creation_time               = (known after apply)
      + dataset_id                  = "example_dataset"
      + delete_contents_on_destroy  = false
      + description                 = "This is a test description"
      + etag                        = (known after apply)
      + friendly_name               = "test"
      + id                          = (known after apply)
      + last_modified_time          = (known after apply)
      + location                    = "<your location>"
      + project                     = (known after apply)
      + self_link                   = (known after apply)

      + access {
          + domain         = (known after apply)
          + group_by_email = (known after apply)
          + role           = (known after apply)
          + special_group  = (known after apply)
          + user_by_email  = (known after apply)

          + view {
              + dataset_id = (known after apply)
              + project_id = (known after apply)
              + table_id   = (known after apply)
            }
        }
    }

  # google_bigquery_table.users will be created
  + resource "google_bigquery_table" "users" {
      + clustering          = [
          + "user_id",
        ]
      + creation_time       = (known after apply)
      + dataset_id          = "example_dataset"
      + deletion_protection = false
      + etag                = (known after apply)
      + expiration_time     = (known after apply)
      + id                  = (known after apply)
      + last_modified_time  = (known after apply)
      + location            = (known after apply)
      + num_bytes           = (known after apply)
      + num_long_term_bytes = (known after apply)
      + num_rows            = (known after apply)
      + project             = (known after apply)
      + schema              = jsonencode(
            [
              + {
                  + description = "user id"
                  + mode        = "REQUIRED"
                  + name        = "user_id"
                  + type        = "INT64"
                },
              + {
                  + description = "user name"
                  + mode        = "NULLABLE"
                  + name        = "name"
                  + type        = "STRING"
                },
              + {
                  + description = "created date"
                  + mode        = "REQUIRED"
                  + name        = "dateday"
                  + type        = "DATE"
                },
            ]
        )
      + self_link           = (known after apply)
      + table_id            = "users"
      + type                = (known after apply)

      + time_partitioning {
          + expiration_ms            = (known after apply)
          + field                    = "dateday"
          + require_partition_filter = true
          + type                     = "DAY"
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

dataset、table合わせて2つのリソースが適用されることがわかりますね。

それでは適用していきましょう。

$ make apply

.
.
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:
.
.
.
Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_bigquery_dataset.dataset: Creating...
google_bigquery_dataset.dataset: Creation complete after 2s [id=projects/<your porject id>/datasets/example_dataset]
google_bigquery_table.users: Creating...
google_bigquery_table.users: Creation complete after 0s [id=projects/<your porject id>/datasets/example_dataset/tables/users]

無事に適用されたようです。実際にリソースが作成されたかコンソール上から確認してみましょう。

以下、GCPのBigQueryのコンソール上の画像ですが、指定のプロジェクトにexample_datasetとusersが作成されています。

また、usersのスキーマについても指定通りです。

これにてTerraformでBigQueryのリソースを作成することができました。

最後に

非常に簡単にTerraformでBigQueryのリソースを作成することができました！
今回はシンプルにdatasetとtableの作成にとどめましたが、権限周りの管理も進めていきたいです！

ちなみに、credentialの管理はキーを作成して配置する以外に良い方法あるんでしょうか？
キーの作成 & 配置やめたい、、、
その辺りは別途調査して、また記事にできればと思います！

それでは今回はこれにて！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up