More than 3 years have passed since last update.

dbt + BigQueryで出力するデータセットを変更する

Posted at 2021-12-14

背景

dbtでモデル定義すると基本的には接続時に指定した、デフオルトのデータセットにモデルが出力されます。モデルの出力先を別のデータセットにしたい場合のメモです。

実装

以下のようなdir構成を想定。

first_table -> first_dataset
second_table -> second_dataset

に出力したいとします。

├── README.md
├── dbt_project.yml
├── macros
│   └── get_custom_schema.sql
├── models
│   ├── first_dir
│   │   └── first_table.sql
│   └── second_dir
│       └── second_table.sql
~略~

dbt-project.ymlに以下のようカスタムスキーマを定義します。

dbt-project.yml

name: 'my_new_project'
version: '1.0.0'
config-version: 2

profile: 'my-profile'

source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  
clean-targets:         
  - "target"
  - "dbt_modules"

models:
  my_new_project:
   first_dir:
      schema: first_dataset
      +materialized: table
    second_dir:
      schema: second_dataset
      +materialized: table

上記定義のみだと <target_schema>_<custom_schema> のdatasetに吐かれてしまうので、 macros/get_custom_schema.sql を変更して、custom_schemaのロジックを変更します。

macros/get_custom_schema.sql

{% macro generate_schema_name(custom_schema_name, node) -%}
  {%- set default_schema = target.schema -%}
  {%- if custome_schema_name is none -%}
    {{ default_schema }}
  {%- else -%}
    {{ custom_schema_name | trim }}
  {%- endif -%}
{%- endmacro %}

これで、first_dir以下のsqlファイルの出力はfirst_datasetに、second_dir以下のsqlファイルの出力はsecond_datasetになります。

参考

Twitter で @takimo さんに教えていただいたやり方です(感謝)
dbt#Using custom schemas
DevelopersIO#[dbt] custom schemaを使って普段とは別のスキーマ下にデータモデルを作成する

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up