最近 dbt-osmosis を触って挙動などを調べてみたのでその結果を記事にまとめてみます。
- ①標準機能編
- ②Refactor機能のコマンドオプション編 ←イマココ
この記事では、 Refactor 機能の主要なコマンドオプションの挙動についてまとめています。
なお、次の環境での調査結果となります:
dbt-bigquery==1.6
dbt-osmosis==0.12.9
Refactor機能のコマンドオプション一覧
❯ dbt-osmosis yaml refactor --help
Usage: dbt-osmosis yaml refactor [OPTIONS] [MODELS]...
Executes organize which syncs yaml files with database schema and organizes the dbt models directory, reparses the project, then executes document passing down inheritable documentation
Options:
--project-dir DIRECTORY Which directory to look in for the dbt_project.yml file. Default is the current working directory and its parents.
--profiles-dir DIRECTORY Which directory to look in for the profiles.yml file. Defaults to ~/.dbt
-t, --target TEXT Which target to load. Overrides default target in the profiles.yml.
-f, --fqn TEXT Specify models based on dbt's FQN. Looks like folder.folder, folder.folder.model, or folder.folder.source.table. Use list command to see the scope of an FQN filter. This may be deprecated in the future. Please use model positional selectors instead.
-F, --force-inheritance If specified, forces documentation to be inherited overriding existing column level documentation where applicable.
-d, --dry-run If specified, no changes are committed to disk.
-C, --check If specified, will return a non-zero exit code if any files are changed.
--catalog-file PATH If specified, will read the list of columns from the catalog.json file instead of querying the warehouse.
--skip-add-columns If specified, we will skip adding columns to the models. This is useful if you want to document your models without adding columns present in the database.
--skip-add-tags If specified, we will skip adding tags to the models.
--skip-add-data-types If specified, we will skip adding data types to the models.
--skip-merge-meta If specified, we will skip merging meta to the models.
--add-progenitor-to-meta If specified, progenitor information will be added to the meta information of a column. This is useful if you want to know which model is the progenitor of a specific model's column.
--profile TEXT Which profile to load. Overrides setting in dbt_project.yml.
--vars TEXT Supply variables to the project. This argument overrides variables defined in your dbt_project.yml file. This argument should be a YAML string, eg. '{my_variable: my_value}'
--use-unrendered-descriptions If specified, will use unrendered column descriptions in the documentation.This is useful for propogating docs blocks
--add-inheritance-for-specified-keys TEXT
If specified, will add inheritance for the specified keys.
--help Show this message and exit.
--force-inheritance
概要
形式
-F, --force-inheritance
機能
指定された場合、既存のカラム情報を上書きします。
実際に試してみた
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run
コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.yml
の中身を次のように変更します:
変更内容
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
-
_schema.yml
の中身を次のように変更します:
変更内容
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: 'hogehoge'
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
dbt-osmosis yaml refactor
を実行し、 stg_customers.customer_id
→ customers.customer_id
へ伝搬させる
実行後の _schema.yml
:
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: "hogehoge"
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
- description: ''
+ description: "hogehoge"
data_type: INT64
〜〜(略)〜〜
さらに _schema.yml
に次の変更を加える
- name: stg_customers
columns:
- name: customer_id
- description: "hogehoge"
+ description: 'barbar' # 編集
data_type: INT64
試行1|そのまま dbt-osmosis yaml refactor
を実行した場合
実行結果:
-
customers.customer_id
のdescription:
が"hogehoge"
のまま- →伝搬されていない
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: "barbar" # 編集
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: "hogehoge"
data_type: INT64
〜〜(略)〜〜
試行2| --force-inheritance
オプション付きで実行した場合
実行コマンド:
dbt-osmosis yaml refactor --force-inheritance
実行結果:
-
customers.customer_id
のdescription:
が"barbar"
に変わっている- →伝搬されている
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: "barbar" # 編集
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: "barbar"
data_type: INT64
〜〜(略)〜〜
osmosis_keep_description:
機能
--force-inheritance
オプションによる上書きをカラムレベルで制御できる設定です。
カラム情報に次のように osmosis_keep_description: true
と設定することで、特定のカラムだけ上書きしないようにすることができます。
meta:
osmosis_keep_description: true
実際に試してみた
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run
コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.yml
の中身を次のように変更します:
変更内容
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ""
data_type: INT64
- name: first_name
description: "名前" # 編集
data_type: INT64
- name: last_name
description: ""
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ""
data_type: INT64
- name: user_id
description: ""
data_type: INT64
- name: order_date
description: ""
data_type: INT64
- name: status
description: ""
data_type: INT64
-
_schema.yml
の中身を次のように変更します:
変更内容
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
dbt-osmosis yaml refactor --force-inheritance
を実行し、カラム情報を伝搬させる
raw_customers.first_name
の description が stg_customers.first_name
、 customers.first_name
へ伝搬されています
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前"
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前"
data_type: INT64
〜〜(略)〜〜
さらに _souce.yml
に次の変更を加える
- name: first_name
- description: "名前" # 編集
+ description: "名前じゃなかった" # 編集
試行1|そのまま dbt-osmosis yaml refactor --force-inheritance
を実行した場合
実行結果:
-
raw_customers.first_name
の description がstg_customers.first_name
、customers.first_name
へ伝搬されている
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
〜〜(略)〜〜
試行2| osmosis_keep_description: true
を追加した場合
前準備の状態で、 stg_customers.frist_name
に osmosis_keep_description: true
を追加します:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
- description: "名前"
+ description: "ここだけ手動変更"
+ meta:
+ osmosis_keep_description: true
dbt-osmosis yaml refactor --force-inheritance
を実行します
実行結果:
-
raw_customers.first_name
→stg_customers.first_name
は上書きされていない- →
osmosis_keep_description: true
で指定した通り
- →
-
stg_customers.first_name
→customers.first_name
は上書きされている-
osmosis_keep_description: true
も一緒に伝搬されている
-
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
meta:
osmosis_keep_description: true
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
data_type: INT64
meta:
osmosis_keep_description: true
〜〜(略)〜〜
_schema.yml
にさらに次の変更を加えます:
〜〜(略)〜〜
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
- description: "ここだけ手動変更"
+ description: "ここだけさらに手動変更"
meta:
osmosis_keep_description: true
〜〜(略)〜〜
dbt-osmosis yaml refactor --force-inheritance
を実行します
実行結果:
-
stg_customers.first_name
→customers.first_name
も上書きされないようになった
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけさらに手動変更"
meta:
osmosis_keep_description: true
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
data_type: INT64
meta:
osmosis_keep_description: true
〜〜(略)〜〜
customers.first_name
の osmosis_keep_description: true
を削除します
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
data_type: INT64
- meta:
- osmosis_keep_description: true
〜〜(略)〜〜
dbt-osmosis yaml refactor --force-inheritance
を実行します
実行結果:
-
stg_customers.first_name
→customers.first_name
は上書きされている-
osmosis_keep_description: true
もまた伝搬されている
-
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけさらに手動変更"
meta:
osmosis_keep_description: true
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけさらに手動変更"
data_type: INT64
meta:
osmosis_keep_description: true
〜〜(略)〜〜
--skip-merge-meta
概要
形式
--skip-merge-meta
機能
指定された場合、 meta:
の項目の伝搬しないようにします。
実際に試してみた
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run
コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.yml
の中身を次のように変更します-
raw_customers.first_name
のmeta:
に次の項目を追加しますhoge: bar
osmosis_keep_description: true
-
変更内容
version: 2
sources:
- name: tmp_dbt_osmosis_test
database: datascience-product
tables:
- name: raw_customers
columns:
- name: id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
meta: # 追加
hoge: bar
osmosis_keep_description: true
data_type: INT64
- name: last_name
description: ""
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ""
data_type: INT64
- name: user_id
description: ""
data_type: INT64
- name: order_date
description: ""
data_type: INT64
- name: status
description: ""
data_type: INT64
-
_schema.yml
の中身を次のように変更します-
stg_customers.first_name
とcustomers.first_name
のmeta:
に次の項目を追加しますhoge: bar
-
変更内容
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
- name: last_name
description: ""
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
- name: last_name
description: ""
data_type: INT64
- name: first_order_date
description: ""
data_type: INT64
- name: most_recent_order_date
description: ""
data_type: INT64
- name: number_of_orders
description: ""
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ""
data_type: INT64
- name: customer_id
description: ""
data_type: INT64
- name: order_date
description: ""
data_type: INT64
- name: status
description: ""
data_type: INT64
試行1|そのまま dbt-osmosis yaml refactor --force-inheritance
を実行した場合
実行結果:
-
stg_customers.first_name
、customers.first_name
のmeta:
にosmosis_keep_description: true
が伝搬されている
version: 2
models:
- name: stg_customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
+ osmosis_keep_description: true
〜〜(略)〜〜
- name: customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
+ osmosis_keep_description: true
〜〜(略)〜〜
試行2| --skip-merge-meta
をつけて実行した場合
実行コマンド:
dbt-osmosis yaml refactor --force-inheritance --skip-merge-meta
実行結果:
-
stg_customers.first_name
、customers.first_name
のmeta:
にosmosis_keep_description: true
が 伝搬されていない
ドキュメントの説明通り、下流テーブルへの meta:
のマージがスキップされていることがわかります。
version: 2
models:
- name: stg_customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
〜〜(略)〜〜
- name: customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
〜〜(略)〜〜
--force-inheritance
、 osmosis_keep_description:
、 --skip-merge-meta
は併用するのが、伝搬を柔軟に制御できて良さそうです。
--skip-add-data-types
概要
形式
--skip-add-data-types
機能
指定された場合、 data_type:
を記載しないようにします。
実際に試してみた
前準備
次のサンプルリポジトリをそのまま使います。あらかじめ一度 dbt run
コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
試行1| _source.yml
にカラム情報を記載せずに実行した場合
_source.yml
は前準備の状態のままにします:
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
- name: raw_orders
dbt-osmosis yaml refactor --skip-add-data-types
を実行します
実行結果:
-
_source.yml
の方はdata_type:
付きでカラム情報が追加されている -
_schema.yml
の方はdata_type:
なしでカラム情報が追加されている
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: first_order_date
description: ''
- name: most_recent_order_date
description: ''
- name: number_of_orders
description: ''
- name: stg_orders
columns:
- name: order_id
description: ''
- name: customer_id
description: ''
- name: order_date
description: ''
- name: status
description: ''
試行2| _source.yml
にカラム情報を記載して実行した場合
前準備の状態から _source.yml
を次のように変更します:
変更内容
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ""
- name: first_name
description: ""
- name: last_name
description: ""
- name: raw_orders
columns:
- name: id
description: ""
- name: user_id
description: ""
- name: order_date
description: ""
- name: status
description: ""
dbt-osmosis yaml refactor --skip-add-data-types
を実行します
実行結果:
-
_source.yml
の方は変化なし(=data_type:
が追加されていない) -
_schema.yml
の方はdata_type:
なしでカラム情報が追加されている
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ""
- name: first_name
description: ""
- name: last_name
description: ""
- name: raw_orders
columns:
- name: id
description: ""
- name: user_id
description: ""
- name: order_date
description: ""
- name: status
description: ""
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: stg_orders
columns:
- name: order_id
description: ''
- name: customer_id
description: ''
- name: order_date
description: ''
- name: status
description: ''
- name: customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: first_order_date
description: ''
- name: most_recent_order_date
description: ''
- name: number_of_orders
description: ''
--add-progenitor-to-meta
概要
形式
--add-progenitor-to-meta
機能
指定された場合、このカラムの始祖がどのモデルなのかを追記します。次のように meta:
に osmosis_progenitor:
という項目が追加されます:
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
実際に試してみた
前準備
次のサンプルリポジトリをそのまま使います。あらかじめ一度 dbt run
コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
試行1|そのまま --add-progenitor-to-meta
オプションをつけて実行
dbt-osmosis yaml refactor --add-progenitor-to-meta
を実行
実行結果:
-
customers
だけmeta:
にカラムの伝搬元が書かれている
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: STRING
- name: last_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
もう1回 dbt-osmosis yaml refactor --add-progenitor-to-meta
を実行
実行結果:
- 全モデルに
meta:
で伝搬元が記載されている
何回か実行しないと収束しない…?
実際の _schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: last_name
description: ''
data_type: STRING
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: STRING
- name: last_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
- name: status
description: ''
data_type: STRING
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
試行2|3重以上のカラム継承がある場合
前準備の状態で次のモデルを追加します:
select
first_name
from
{{ ref('customers') }}
select
first_name
from
{{ ref('customers_depth_2') }}
追加後のデータリネージは次のようになります:
dbt-osmosis yaml refactor --add-progenitor-to-meta
を実行します(1回目)
実行結果:
-
customers
以降はmeta:
が付与されている
実際の _schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: customers_depth_3
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: customers_depth_2
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: last_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
もう1回 dbt-osmosis yaml refactor --add-progenitor-to-meta
を実行します
実行結果:
- 全モデル
meta:
が付与されている
Source から2層目だけちょっと処理が違うのかな…?(要検証)
実際の _schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: last_name
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: customers_depth_3
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: customers_depth_2
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: last_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
- name: status
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
--use-unrendered-descriptions
概要
形式
--use-unrendered-descriptions
機能
デフォルト挙動では、 description:
の中身はレンダリングされます。
description: "{{ doc(\"first_name\") }}"
↓
description: "### about\n\nfirst_name"
このオプションを指定することでレンダリング前の中身をそのまま流用するようにできます。
description: "{{ doc(\"first_name\") }}"
↓
description: "{{ doc(\"first_name\") }}"
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run
コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.yml
を次の内容へ変更します:
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: first_name
description: |
{{ doc("first_name") }}
- name: raw_orders
-
models/_doc.md
を次の内容で作成します:
{% docs first_name %}
### about
first_name
{% enddocs %}
試行1|そのまま dbt-osmosis yaml refactor
を実行
実行結果:
-
_source.yml
では_doc.md
の中身が展開されていない -
_schema.yml
では_doc.md
の中身が展開されている
- name: first_name
description: '### about
first_name'
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: |
{{ doc("first_name") }}
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '### about
first_name'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '### about
first_name'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
試行2| --use-unrendered-descriptions
オプションをつけて実行
実行コマンド:
dbt-osmosis yaml refactor --use-unrendered-descriptions
実行結果:
-
_doc.md
の中身が展開されていない
- name: first_name
description: '{{ doc("first_name") }}
'
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: |
{{ doc("first_name") }}
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '{{ doc("first_name") }}
'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '{{ doc("first_name") }}
'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
結び
この記事では dbt-osmosis の Refactor 機能について、いくつかのオプションの挙動をまとめました。記事に誤りなどありましたらご指摘いただけますと幸いです
ここまでお読みいただきありがとうございました。