2025/04/07 追記
2025年1月にメジャーアップデートが入り、従来から挙動が変わっている箇所がいくつかあります。
詳細はこちらをご確認ください。
最近 dbt-osmosis を触って挙動などを調べてみたのでその結果を記事にまとめてみます。
- ①標準機能編
- ②Refactor機能のコマンドオプション編 ←イマココ
この記事では、 Refactor 機能の主要なコマンドオプションの挙動についてまとめています。
なお、次の環境での調査結果となります:
dbt-bigquery==1.6
dbt-osmosis==0.12.9
Refactor機能のコマンドオプション一覧
❯ dbt-osmosis yaml refactor --help
Usage: dbt-osmosis yaml refactor [OPTIONS] [MODELS]...
Executes organize which syncs yaml files with database schema and organizes the dbt models directory, reparses the project, then executes document passing down inheritable documentation
Options:
--project-dir DIRECTORY Which directory to look in for the dbt_project.yml file. Default is the current working directory and its parents.
--profiles-dir DIRECTORY Which directory to look in for the profiles.yml file. Defaults to ~/.dbt
-t, --target TEXT Which target to load. Overrides default target in the profiles.yml.
-f, --fqn TEXT Specify models based on dbt's FQN. Looks like folder.folder, folder.folder.model, or folder.folder.source.table. Use list command to see the scope of an FQN filter. This may be deprecated in the future. Please use model positional selectors instead.
-F, --force-inheritance If specified, forces documentation to be inherited overriding existing column level documentation where applicable.
-d, --dry-run If specified, no changes are committed to disk.
-C, --check If specified, will return a non-zero exit code if any files are changed.
--catalog-file PATH If specified, will read the list of columns from the catalog.json file instead of querying the warehouse.
--skip-add-columns If specified, we will skip adding columns to the models. This is useful if you want to document your models without adding columns present in the database.
--skip-add-tags If specified, we will skip adding tags to the models.
--skip-add-data-types If specified, we will skip adding data types to the models.
--skip-merge-meta If specified, we will skip merging meta to the models.
--add-progenitor-to-meta If specified, progenitor information will be added to the meta information of a column. This is useful if you want to know which model is the progenitor of a specific model's column.
--profile TEXT Which profile to load. Overrides setting in dbt_project.yml.
--vars TEXT Supply variables to the project. This argument overrides variables defined in your dbt_project.yml file. This argument should be a YAML string, eg. '{my_variable: my_value}'
--use-unrendered-descriptions If specified, will use unrendered column descriptions in the documentation.This is useful for propogating docs blocks
--add-inheritance-for-specified-keys TEXT
If specified, will add inheritance for the specified keys.
--help Show this message and exit.
--force-inheritance
概要
形式
-F, --force-inheritance
機能
指定された場合、既存のカラム情報を上書きします。
実際に試してみた
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.ymlの中身を次のように変更します:
変更内容
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
-
_schema.ymlの中身を次のように変更します:
変更内容
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: 'hogehoge'
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
dbt-osmosis yaml refactor を実行し、 stg_customers.customer_id → customers.customer_id へ伝搬させる
実行後の _schema.yml :
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: "hogehoge"
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
- description: ''
+ description: "hogehoge"
data_type: INT64
〜〜(略)〜〜
さらに _schema.yml に次の変更を加える
- name: stg_customers
columns:
- name: customer_id
- description: "hogehoge"
+ description: 'barbar' # 編集
data_type: INT64
試行1|そのまま dbt-osmosis yaml refactor を実行した場合
実行結果:
-
customers.customer_idのdescription:が"hogehoge"のまま- →伝搬されていない
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: "barbar" # 編集
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: "hogehoge"
data_type: INT64
〜〜(略)〜〜
試行2| --force-inheritance オプション付きで実行した場合
実行コマンド:
dbt-osmosis yaml refactor --force-inheritance
実行結果:
-
customers.customer_idのdescription:が"barbar"に変わっている- →伝搬されている
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: "barbar" # 編集
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: "barbar"
data_type: INT64
〜〜(略)〜〜
osmosis_keep_description:
機能
--force-inheritance オプションによる上書きをカラムレベルで制御できる設定です。
カラム情報に次のように osmosis_keep_description: true と設定することで、特定のカラムだけ上書きしないようにすることができます。
meta:
osmosis_keep_description: true
実際に試してみた
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.ymlの中身を次のように変更します:
変更内容
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ""
data_type: INT64
- name: first_name
description: "名前" # 編集
data_type: INT64
- name: last_name
description: ""
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ""
data_type: INT64
- name: user_id
description: ""
data_type: INT64
- name: order_date
description: ""
data_type: INT64
- name: status
description: ""
data_type: INT64
-
_schema.ymlの中身を次のように変更します:
変更内容
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
dbt-osmosis yaml refactor --force-inheritance を実行し、カラム情報を伝搬させる
raw_customers.first_name の description が stg_customers.first_name 、 customers.first_name へ伝搬されています
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前"
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前"
data_type: INT64
〜〜(略)〜〜
さらに _souce.yml に次の変更を加える
- name: first_name
- description: "名前" # 編集
+ description: "名前じゃなかった" # 編集
試行1|そのまま dbt-osmosis yaml refactor --force-inheritance を実行した場合
実行結果:
-
raw_customers.first_nameの description がstg_customers.first_name、customers.first_nameへ伝搬されている
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
〜〜(略)〜〜
試行2| osmosis_keep_description: true を追加した場合
前準備の状態で、 stg_customers.frist_name に osmosis_keep_description: true を追加します:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
- description: "名前"
+ description: "ここだけ手動変更"
+ meta:
+ osmosis_keep_description: true
dbt-osmosis yaml refactor --force-inheritance を実行します
実行結果:
-
raw_customers.first_name→stg_customers.first_nameは上書きされていない- →
osmosis_keep_description: trueで指定した通り
- →
-
stg_customers.first_name→customers.first_nameは上書きされている-
osmosis_keep_description: trueも一緒に伝搬されている
-
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
meta:
osmosis_keep_description: true
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
data_type: INT64
meta:
osmosis_keep_description: true
〜〜(略)〜〜
_schema.yml にさらに次の変更を加えます:
〜〜(略)〜〜
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
- description: "ここだけ手動変更"
+ description: "ここだけさらに手動変更"
meta:
osmosis_keep_description: true
〜〜(略)〜〜
dbt-osmosis yaml refactor --force-inheritance を実行します
実行結果:
-
stg_customers.first_name→customers.first_nameも上書きされないようになった
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけさらに手動変更"
meta:
osmosis_keep_description: true
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
data_type: INT64
meta:
osmosis_keep_description: true
〜〜(略)〜〜
customers.first_name の osmosis_keep_description: true を削除します
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけ手動変更"
data_type: INT64
- meta:
- osmosis_keep_description: true
〜〜(略)〜〜
dbt-osmosis yaml refactor --force-inheritance を実行します
実行結果:
-
stg_customers.first_name→customers.first_nameは上書きされている-
osmosis_keep_description: trueもまた伝搬されている
-
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけさらに手動変更"
meta:
osmosis_keep_description: true
data_type: INT64
〜〜(略)〜〜
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "ここだけさらに手動変更"
data_type: INT64
meta:
osmosis_keep_description: true
〜〜(略)〜〜
--skip-merge-meta
概要
形式
--skip-merge-meta
機能
指定された場合、 meta: の項目の伝搬しないようにします。
実際に試してみた
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.ymlの中身を次のように変更します-
raw_customers.first_nameのmeta:に次の項目を追加しますhoge: barosmosis_keep_description: true
-
変更内容
version: 2
sources:
- name: tmp_dbt_osmosis_test
database: datascience-product
tables:
- name: raw_customers
columns:
- name: id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
meta: # 追加
hoge: bar
osmosis_keep_description: true
data_type: INT64
- name: last_name
description: ""
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ""
data_type: INT64
- name: user_id
description: ""
data_type: INT64
- name: order_date
description: ""
data_type: INT64
- name: status
description: ""
data_type: INT64
-
_schema.ymlの中身を次のように変更します-
stg_customers.first_nameとcustomers.first_nameのmeta:に次の項目を追加しますhoge: bar
-
変更内容
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
- name: last_name
description: ""
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ""
data_type: INT64
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
- name: last_name
description: ""
data_type: INT64
- name: first_order_date
description: ""
data_type: INT64
- name: most_recent_order_date
description: ""
data_type: INT64
- name: number_of_orders
description: ""
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ""
data_type: INT64
- name: customer_id
description: ""
data_type: INT64
- name: order_date
description: ""
data_type: INT64
- name: status
description: ""
data_type: INT64
試行1|そのまま dbt-osmosis yaml refactor --force-inheritance を実行した場合
実行結果:
-
stg_customers.first_name、customers.first_nameのmeta:にosmosis_keep_description: trueが伝搬されている
version: 2
models:
- name: stg_customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
+ osmosis_keep_description: true
〜〜(略)〜〜
- name: customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
+ osmosis_keep_description: true
〜〜(略)〜〜
試行2| --skip-merge-meta をつけて実行した場合
実行コマンド:
dbt-osmosis yaml refactor --force-inheritance --skip-merge-meta
実行結果:
-
stg_customers.first_name、customers.first_nameのmeta:にosmosis_keep_description: trueが 伝搬されていない
ドキュメントの説明通り、下流テーブルへの meta: のマージがスキップされていることがわかります。
version: 2
models:
- name: stg_customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
〜〜(略)〜〜
- name: customers
columns:
〜〜(略)〜〜
- name: first_name
description: "名前じゃなかった"
data_type: INT64
meta: # 追加
hoge: bar
〜〜(略)〜〜
--force-inheritance 、 osmosis_keep_description: 、 --skip-merge-meta は併用するのが、伝搬を柔軟に制御できて良さそうです。
--skip-add-data-types
概要
形式
--skip-add-data-types
機能
指定された場合、 data_type: を記載しないようにします。
実際に試してみた
前準備
次のサンプルリポジトリをそのまま使います。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
試行1| _source.yml にカラム情報を記載せずに実行した場合
_source.yml は前準備の状態のままにします:
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
- name: raw_orders
dbt-osmosis yaml refactor --skip-add-data-types を実行します
実行結果:
-
_source.ymlの方はdata_type:付きでカラム情報が追加されている -
_schema.ymlの方はdata_type:なしでカラム情報が追加されている
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: first_order_date
description: ''
- name: most_recent_order_date
description: ''
- name: number_of_orders
description: ''
- name: stg_orders
columns:
- name: order_id
description: ''
- name: customer_id
description: ''
- name: order_date
description: ''
- name: status
description: ''
試行2| _source.yml にカラム情報を記載して実行した場合
前準備の状態から _source.yml を次のように変更します:
変更内容
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ""
- name: first_name
description: ""
- name: last_name
description: ""
- name: raw_orders
columns:
- name: id
description: ""
- name: user_id
description: ""
- name: order_date
description: ""
- name: status
description: ""
dbt-osmosis yaml refactor --skip-add-data-types を実行します
実行結果:
-
_source.ymlの方は変化なし(=data_type:が追加されていない) -
_schema.ymlの方はdata_type:なしでカラム情報が追加されている
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ""
- name: first_name
description: ""
- name: last_name
description: ""
- name: raw_orders
columns:
- name: id
description: ""
- name: user_id
description: ""
- name: order_date
description: ""
- name: status
description: ""
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: stg_orders
columns:
- name: order_id
description: ''
- name: customer_id
description: ''
- name: order_date
description: ''
- name: status
description: ''
- name: customers
columns:
- name: customer_id
description: ''
- name: first_name
description: ''
- name: last_name
description: ''
- name: first_order_date
description: ''
- name: most_recent_order_date
description: ''
- name: number_of_orders
description: ''
--add-progenitor-to-meta
概要
形式
--add-progenitor-to-meta
機能
指定された場合、このカラムの始祖がどのモデルなのかを追記します。次のように meta: に osmosis_progenitor: という項目が追加されます:
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
実際に試してみた
前準備
次のサンプルリポジトリをそのまま使います。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
試行1|そのまま --add-progenitor-to-meta オプションをつけて実行
dbt-osmosis yaml refactor --add-progenitor-to-meta を実行
実行結果:
-
customersだけmeta:にカラムの伝搬元が書かれている
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
- name: last_name
description: ''
data_type: STRING
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
- name: status
description: ''
data_type: STRING
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: STRING
- name: last_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
もう1回 dbt-osmosis yaml refactor --add-progenitor-to-meta を実行
実行結果:
- 全モデルに
meta:で伝搬元が記載されている
何回か実行しないと収束しない…?
実際の _schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: STRING
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: last_name
description: ''
data_type: STRING
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: STRING
- name: last_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: STRING
- name: first_order_date
description: ''
data_type: DATE
- name: most_recent_order_date
description: ''
data_type: DATE
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: DATE
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
- name: status
description: ''
data_type: STRING
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
試行2|3重以上のカラム継承がある場合
前準備の状態で次のモデルを追加します:
select
first_name
from
{{ ref('customers') }}
select
first_name
from
{{ ref('customers_depth_2') }}
追加後のデータリネージは次のようになります:
dbt-osmosis yaml refactor --add-progenitor-to-meta を実行します(1回目)
実行結果:
-
customers以降はmeta:が付与されている
実際の _schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: customers_depth_3
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: customers_depth_2
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: last_name
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
もう1回 dbt-osmosis yaml refactor --add-progenitor-to-meta を実行します
実行結果:
- 全モデル
meta:が付与されている
Source から2層目だけちょっと処理が違うのかな…?(要検証)
実際の _schema.yml
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: last_name
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
- name: customers_depth_3
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: customers_depth_2
columns:
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
meta:
osmosis_progenitor: model.my_dbt_project.stg_customers
data_type: INT64
- name: first_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: last_name
description: ''
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
- name: status
description: ''
data_type: INT64
meta:
osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
--use-unrendered-descriptions
概要
形式
--use-unrendered-descriptions
機能
デフォルト挙動では、 description: の中身はレンダリングされます。
description: "{{ doc(\"first_name\") }}"
↓
description: "### about\n\nfirst_name"
このオプションを指定することでレンダリング前の中身をそのまま流用するようにできます。
description: "{{ doc(\"first_name\") }}"
↓
description: "{{ doc(\"first_name\") }}"
前準備
次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。
上記の前提に基づいて、前準備の手順を下記に示します。
上記リポジトリに次の変更を加える
-
_source.ymlを次の内容へ変更します:
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: first_name
description: |
{{ doc("first_name") }}
- name: raw_orders
-
models/_doc.mdを次の内容で作成します:
{% docs first_name %}
### about
first_name
{% enddocs %}
試行1|そのまま dbt-osmosis yaml refactor を実行
実行結果:
-
_source.ymlでは_doc.mdの中身が展開されていない -
_schema.ymlでは_doc.mdの中身が展開されている
- name: first_name
description: '### about
first_name'
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: |
{{ doc("first_name") }}
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '### about
first_name'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '### about
first_name'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
試行2| --use-unrendered-descriptions オプションをつけて実行
実行コマンド:
dbt-osmosis yaml refactor --use-unrendered-descriptions
実行結果:
-
_doc.mdの中身が展開されていない
- name: first_name
description: '{{ doc("first_name") }}
'
実際の _source.yml と _schema.yml
version: 2
sources:
- name: YOUR_GC_DATASET
database: YOUR_GC_PROJECT_ID
tables:
- name: raw_customers
columns:
- name: id
description: ''
data_type: INT64
- name: first_name
description: |
{{ doc("first_name") }}
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: raw_orders
columns:
- name: id
description: ''
data_type: INT64
- name: user_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
version: 2
models:
- name: stg_customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '{{ doc("first_name") }}
'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: stg_orders
columns:
- name: order_id
description: ''
data_type: INT64
- name: customer_id
description: ''
data_type: INT64
- name: order_date
description: ''
data_type: INT64
- name: status
description: ''
data_type: INT64
- name: customers
columns:
- name: customer_id
description: ''
data_type: INT64
- name: first_name
description: '{{ doc("first_name") }}
'
data_type: INT64
- name: last_name
description: ''
data_type: INT64
- name: first_order_date
description: ''
data_type: INT64
- name: most_recent_order_date
description: ''
data_type: INT64
- name: number_of_orders
description: ''
data_type: INT64
結び
この記事では dbt-osmosis の Refactor 機能について、いくつかのオプションの挙動をまとめました。記事に誤りなどありましたらご指摘いただけますと幸いです ![]()
ここまでお読みいただきありがとうございました。