1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

お題は不問!Qiita Engineer Festa 2024で記事投稿!
Qiita Engineer Festa20242024年7月17日まで開催中!

dbt-osmosisの挙動を調べてみた 〜②Refactor機能のコマンドオプション編〜

Last updated at Posted at 2024-07-14

最近 dbt-osmosis を触って挙動などを調べてみたのでその結果を記事にまとめてみます。

この記事では、 Refactor 機能の主要なコマンドオプションの挙動についてまとめています。

なお、次の環境での調査結果となります:

requirements.txt
dbt-bigquery==1.6
dbt-osmosis==0.12.9

Refactor機能のコマンドオプション一覧

❯ dbt-osmosis yaml refactor --help
Usage: dbt-osmosis yaml refactor [OPTIONS] [MODELS]...

  Executes organize which syncs yaml files with database schema and organizes the dbt models directory, reparses the project, then executes document passing down inheritable documentation

Options:
  --project-dir DIRECTORY         Which directory to look in for the dbt_project.yml file. Default is the current working directory and its parents.
  --profiles-dir DIRECTORY        Which directory to look in for the profiles.yml file. Defaults to ~/.dbt
  -t, --target TEXT               Which target to load. Overrides default target in the profiles.yml.
  -f, --fqn TEXT                  Specify models based on dbt's FQN. Looks like folder.folder, folder.folder.model, or folder.folder.source.table. Use list command to see the scope of an FQN filter. This may be deprecated in the future. Please use model positional selectors instead.
  -F, --force-inheritance         If specified, forces documentation to be inherited overriding existing column level documentation where applicable.
  -d, --dry-run                   If specified, no changes are committed to disk.
  -C, --check                     If specified, will return a non-zero exit code if any files are changed.
  --catalog-file PATH             If specified, will read the list of columns from the catalog.json file instead of querying the warehouse.
  --skip-add-columns              If specified, we will skip adding columns to the models. This is useful if you want to document your models without adding columns present in the database.
  --skip-add-tags                 If specified, we will skip adding tags to the models.
  --skip-add-data-types           If specified, we will skip adding data types to the models.
  --skip-merge-meta               If specified, we will skip merging meta to the models.
  --add-progenitor-to-meta        If specified, progenitor information will be added to the meta information of a column. This is useful if you want to know which model is the progenitor of a specific model's column.
  --profile TEXT                  Which profile to load. Overrides setting in dbt_project.yml.
  --vars TEXT                     Supply variables to the project. This argument overrides variables defined in your dbt_project.yml file. This argument should be a YAML string, eg. '{my_variable: my_value}'
  --use-unrendered-descriptions   If specified, will use unrendered column descriptions in the documentation.This is useful for propogating docs blocks
  --add-inheritance-for-specified-keys TEXT
                                  If specified, will add inheritance for the specified keys.
  --help                          Show this message and exit.

--force-inheritance

概要

形式

-F, --force-inheritance

機能

指定された場合、既存のカラム情報を上書きします。

実際に試してみた

前準備

次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。

上記の前提に基づいて、前準備の手順を下記に示します。

:one: 上記リポジトリに次の変更を加える

  • _source.yml の中身を次のように変更します:
:pencil: 変更内容
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: first_name
            description: ''
            data_type: STRING
          - name: last_name
            description: ''
            data_type: STRING
      - name: raw_orders
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: user_id
            description: ''
            data_type: INT64
          - name: order_date
            description: ''
            data_type: DATE
          - name: status
            description: ''
            data_type: STRING
  • _schema.yml の中身を次のように変更します:
:pencil: 変更内容
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: 'hogehoge'
        data_type: INT64
      - name: first_name
        description: ''
        data_type: STRING
      - name: last_name
        description: ''
        data_type: STRING
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: DATE
      - name: status
        description: ''
        data_type: STRING
  - name: customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: STRING
      - name: last_name
        description: ''
        data_type: STRING
      - name: first_order_date
        description: ''
        data_type: DATE
      - name: most_recent_order_date
        description: ''
        data_type: DATE
      - name: number_of_orders
        description: ''
        data_type: INT64

:two: dbt-osmosis yaml refactor を実行し、 stg_customers.customer_idcustomers.customer_id へ伝搬させる

実行後の _schema.yml :

_schema.yml
    version: 2
    models:
      - name: stg_customers
        columns:
          - name: customer_id
            description: "hogehoge"
            data_type: INT64

〜〜(略)〜〜

      - name: customers
        columns:
          - name: customer_id
-           description: ''
+           description: "hogehoge"
            data_type: INT64
〜〜(略)〜〜

:three: さらに _schema.yml に次の変更を加える

_schema.yml
      - name: stg_customers
        columns:
          - name: customer_id
-           description: "hogehoge"
+           description: 'barbar' # 編集
            data_type: INT64

試行1|そのまま dbt-osmosis yaml refactor を実行した場合

実行結果:

  • customers.customer_iddescription:"hogehoge" のまま
    • →伝搬されていない
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: "barbar" # 編集
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: "hogehoge"
        data_type: INT64

〜〜(略)〜〜

試行2| --force-inheritance オプション付きで実行した場合

実行コマンド:

dbt-osmosis yaml refactor --force-inheritance

実行結果:

  • customers.customer_iddescription:"barbar" に変わっている
    • →伝搬されている
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: "barbar" # 編集
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: "barbar"
        data_type: INT64

〜〜(略)〜〜

osmosis_keep_description:

機能

--force-inheritance オプションによる上書きをカラムレベルで制御できる設定です。

カラム情報に次のように osmosis_keep_description: true と設定することで、特定のカラムだけ上書きしないようにすることができます。

meta:
  osmosis_keep_description: true

参考: dbt-osmosisの運用問題について考える - yasuhisa's blog

実際に試してみた

前準備

次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。

上記の前提に基づいて、前準備の手順を下記に示します。

:one: 上記リポジトリに次の変更を加える

  • _source.yml の中身を次のように変更します:
:pencil: 変更内容
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ""
            data_type: INT64
          - name: first_name
            description: "名前" # 編集
            data_type: INT64
          - name: last_name
            description: ""
            data_type: INT64
      - name: raw_orders
        columns:
          - name: id
            description: ""
            data_type: INT64
          - name: user_id
            description: ""
            data_type: INT64
          - name: order_date
            description: ""
            data_type: INT64
          - name: status
            description: ""
            data_type: INT64
  • _schema.yml の中身を次のように変更します:
:pencil: 変更内容
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: STRING
      - name: last_name
        description: ''
        data_type: STRING
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: DATE
      - name: status
        description: ''
        data_type: STRING
  - name: customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: STRING
      - name: last_name
        description: ''
        data_type: STRING
      - name: first_order_date
        description: ''
        data_type: DATE
      - name: most_recent_order_date
        description: ''
        data_type: DATE
      - name: number_of_orders
        description: ''
        data_type: INT64

:two: dbt-osmosis yaml refactor --force-inheritance を実行し、カラム情報を伝搬させる

raw_customers.first_name の description が stg_customers.first_namecustomers.first_name へ伝搬されています

_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "名前"
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "名前"
        data_type: INT64

〜〜(略)〜〜

:three: さらに _souce.yml に次の変更を加える

_souce.yml
            - name: first_name
-             description: "名前" # 編集
+             description: "名前じゃなかった" # 編集

試行1|そのまま dbt-osmosis yaml refactor --force-inheritance を実行した場合

実行結果:

  • raw_customers.first_name の description が stg_customers.first_namecustomers.first_name へ伝搬されている
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "名前じゃなかった"
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "名前じゃなかった"
        data_type: INT64

〜〜(略)〜〜

試行2| osmosis_keep_description: true を追加した場合

:one: 前準備の状態で、 stg_customers.frist_nameosmosis_keep_description: true を追加します:

_schema.yml
      - name: stg_customers
        columns:
          - name: customer_id
            description: ""
            data_type: INT64
          - name: first_name
-           description: "名前"
+           description: "ここだけ手動変更"
+           meta:
+             osmosis_keep_description: true

:two: dbt-osmosis yaml refactor --force-inheritance を実行します

実行結果:

  • raw_customers.first_namestg_customers.first_name は上書きされていない
    • osmosis_keep_description: true で指定した通り
  • stg_customers.first_namecustomers.first_name は上書きされている
    • osmosis_keep_description: true も一緒に伝搬されている
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "ここだけ手動変更"
        meta:
          osmosis_keep_description: true
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "ここだけ手動変更"
        data_type: INT64
        meta:
          osmosis_keep_description: true

〜〜(略)〜〜

:three: _schema.yml にさらに次の変更を加えます:

_schema.yml
〜〜(略)〜〜
      - name: stg_customers
        columns:
          - name: customer_id
            description: ""
            data_type: INT64
          - name: first_name
-           description: "ここだけ手動変更"
+           description: "ここだけさらに手動変更"
            meta:
              osmosis_keep_description: true
〜〜(略)〜〜

:four: dbt-osmosis yaml refactor --force-inheritance を実行します

実行結果:

  • stg_customers.first_namecustomers.first_name も上書きされないようになった
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "ここだけさらに手動変更"
        meta:
          osmosis_keep_description: true
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "ここだけ手動変更"
        data_type: INT64
        meta:
          osmosis_keep_description: true

〜〜(略)〜〜

:five: customers.first_nameosmosis_keep_description: true を削除します

_schema.yml
〜〜(略)〜〜

      - name: customers
        columns:
          - name: customer_id
            description: ""
            data_type: INT64
          - name: first_name
            description: "ここだけ手動変更"
            data_type: INT64
-           meta:
-             osmosis_keep_description: true

〜〜(略)〜〜

:six: dbt-osmosis yaml refactor --force-inheritance を実行します

実行結果:

  • stg_customers.first_namecustomers.first_name は上書きされている
    • osmosis_keep_description: true もまた伝搬されている
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "ここだけさらに手動変更"
        meta:
          osmosis_keep_description: true
        data_type: INT64

〜〜(略)〜〜

  - name: customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "ここだけさらに手動変更"
        data_type: INT64
        meta:
          osmosis_keep_description: true

〜〜(略)〜〜

--skip-merge-meta

概要

形式

--skip-merge-meta

機能

指定された場合、 meta: の項目の伝搬しないようにします。

実際に試してみた

前準備

次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。

上記の前提に基づいて、前準備の手順を下記に示します。

:one: 上記リポジトリに次の変更を加える

  • _source.yml の中身を次のように変更します
    • raw_customers.first_namemeta: に次の項目を追加します
      • hoge: bar
      • osmosis_keep_description: true
:pencil: 変更内容
_source.yml
version: 2

sources:
  - name: tmp_dbt_osmosis_test
    database: datascience-product
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ""
            data_type: INT64
          - name: first_name
            description: "名前じゃなかった"
            meta: # 追加
              hoge: bar
              osmosis_keep_description: true
            data_type: INT64
          - name: last_name
            description: ""
            data_type: INT64
      - name: raw_orders
        columns:
          - name: id
            description: ""
            data_type: INT64
          - name: user_id
            description: ""
            data_type: INT64
          - name: order_date
            description: ""
            data_type: INT64
          - name: status
            description: ""
            data_type: INT64
  • _schema.yml の中身を次のように変更します
    • stg_customers.first_namecustomers.first_namemeta: に次の項目を追加します
      • hoge: bar
:pencil: 変更内容
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "名前じゃなかった"
        data_type: INT64
        meta: # 追加
          hoge: bar
      - name: last_name
        description: ""
        data_type: INT64
  - name: customers
    columns:
      - name: customer_id
        description: ""
        data_type: INT64
      - name: first_name
        description: "名前じゃなかった"
        data_type: INT64
        meta: # 追加
          hoge: bar
      - name: last_name
        description: ""
        data_type: INT64
      - name: first_order_date
        description: ""
        data_type: INT64
      - name: most_recent_order_date
        description: ""
        data_type: INT64
      - name: number_of_orders
        description: ""
        data_type: INT64
  - name: stg_orders
    columns:
      - name: order_id
        description: ""
        data_type: INT64
      - name: customer_id
        description: ""
        data_type: INT64
      - name: order_date
        description: ""
        data_type: INT64
      - name: status
        description: ""
        data_type: INT64

試行1|そのまま dbt-osmosis yaml refactor --force-inheritance を実行した場合

実行結果:

  • stg_customers.first_namecustomers.first_namemeta:osmosis_keep_description: true が伝搬されている
_schema.yml
    version: 2
    models:
      - name: stg_customers
        columns:
    
    〜〜(略)〜〜
    
          - name: first_name
            description: "名前じゃなかった"
            data_type: INT64
            meta: # 追加
              hoge: bar
+             osmosis_keep_description: true
    
    〜〜(略)〜〜
    
      - name: customers
        columns:
    
    〜〜(略)〜〜
    
          - name: first_name
            description: "名前じゃなかった"
            data_type: INT64
            meta: # 追加
              hoge: bar
+             osmosis_keep_description: true
    
    〜〜(略)〜〜

試行2| --skip-merge-meta をつけて実行した場合

実行コマンド:

dbt-osmosis yaml refactor --force-inheritance --skip-merge-meta

実行結果:

  • stg_customers.first_namecustomers.first_namemeta:osmosis_keep_description: true伝搬されていない

ドキュメントの説明通り、下流テーブルへの meta: のマージがスキップされていることがわかります。

_schema.yml
    version: 2
    models:
      - name: stg_customers
        columns:
    
    〜〜(略)〜〜
    
          - name: first_name
            description: "名前じゃなかった"
            data_type: INT64
            meta: # 追加
              hoge: bar
    
    〜〜(略)〜〜
    
      - name: customers
        columns:
    
    〜〜(略)〜〜
    
          - name: first_name
            description: "名前じゃなかった"
            data_type: INT64
            meta: # 追加
              hoge: bar
    
    〜〜(略)〜〜

--force-inheritanceosmosis_keep_description:--skip-merge-meta は併用するのが、伝搬を柔軟に制御できて良さそうです。

参考: https://x.com/syou6162/status/1813078272470884677

--skip-add-data-types

概要

形式

--skip-add-data-types

機能

指定された場合、 data_type: を記載しないようにします。

実際に試してみた

前準備

次のサンプルリポジトリをそのまま使います。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。

試行1| _source.yml にカラム情報を記載せずに実行した場合

:one: _source.yml は前準備の状態のままにします:

_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
      - name: raw_orders

:two: dbt-osmosis yaml refactor --skip-add-data-types を実行します

実行結果:

  • _source.yml の方は data_type: 付きでカラム情報が追加されている
  • _schema.yml の方は data_type: なしでカラム情報が追加されている
:pencil: 実際の _source.yml と _schema.yml
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: first_name
            description: ''
            data_type: STRING
          - name: last_name
            description: ''
            data_type: STRING
      - name: raw_orders
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: user_id
            description: ''
            data_type: INT64
          - name: order_date
            description: ''
            data_type: DATE
          - name: status
            description: ''
            data_type: STRING
_source.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
      - name: first_name
        description: ''
      - name: last_name
        description: ''
  - name: customers
    columns:
      - name: customer_id
        description: ''
      - name: first_name
        description: ''
      - name: last_name
        description: ''
      - name: first_order_date
        description: ''
      - name: most_recent_order_date
        description: ''
      - name: number_of_orders
        description: ''
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
      - name: customer_id
        description: ''
      - name: order_date
        description: ''
      - name: status
        description: ''

試行2| _source.yml にカラム情報を記載して実行した場合

:one: 前準備の状態から _source.yml を次のように変更します:

:pencil: 変更内容
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ""
          - name: first_name
            description: ""
          - name: last_name
            description: ""
      - name: raw_orders
        columns:
          - name: id
            description: ""
          - name: user_id
            description: ""
          - name: order_date
            description: ""
          - name: status
            description: ""

:two: dbt-osmosis yaml refactor --skip-add-data-types を実行します

実行結果:

  • _source.yml の方は変化なし(= data_type: が追加されていない)
  • _schema.yml の方は data_type: なしでカラム情報が追加されている
:pencil: 実際の _source.yml と _schema.yml
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ""
          - name: first_name
            description: ""
          - name: last_name
            description: ""
      - name: raw_orders
        columns:
          - name: id
            description: ""
          - name: user_id
            description: ""
          - name: order_date
            description: ""
          - name: status
            description: ""
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
      - name: first_name
        description: ''
      - name: last_name
        description: ''
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
      - name: customer_id
        description: ''
      - name: order_date
        description: ''
      - name: status
        description: ''
  - name: customers
    columns:
      - name: customer_id
        description: ''
      - name: first_name
        description: ''
      - name: last_name
        description: ''
      - name: first_order_date
        description: ''
      - name: most_recent_order_date
        description: ''
      - name: number_of_orders
        description: ''

--add-progenitor-to-meta

概要

形式

--add-progenitor-to-meta

機能

指定された場合、このカラムの始祖がどのモデルなのかを追記します。次のように meta:osmosis_progenitor: という項目が追加されます:

meta:
  osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers

実際に試してみた

前準備

次のサンプルリポジトリをそのまま使います。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。

試行1|そのまま --add-progenitor-to-meta オプションをつけて実行

:one: dbt-osmosis yaml refactor --add-progenitor-to-meta を実行

実行結果:

  • customers だけ meta: にカラムの伝搬元が書かれている
:pencil: 実際の _source.yml と _schema.yml
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: first_name
            description: ''
            data_type: STRING
          - name: last_name
            description: ''
            data_type: STRING
      - name: raw_orders
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: user_id
            description: ''
            data_type: INT64
          - name: order_date
            description: ''
            data_type: DATE
          - name: status
            description: ''
            data_type: STRING
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: STRING
      - name: last_name
        description: ''
        data_type: STRING
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: DATE
      - name: status
        description: ''
        data_type: STRING
  - name: customers
    columns:
      - name: customer_id
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: STRING
      - name: last_name
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: STRING
      - name: first_order_date
        description: ''
        data_type: DATE
      - name: most_recent_order_date
        description: ''
        data_type: DATE
      - name: number_of_orders
        description: ''
        data_type: INT64

:two: もう1回 dbt-osmosis yaml refactor --add-progenitor-to-meta を実行

実行結果:

  • 全モデルに meta: で伝搬元が記載されている

何回か実行しないと収束しない…?

:pencil: 実際の _schema.yml
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: STRING
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
      - name: last_name
        description: ''
        data_type: STRING
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
  - name: customers
    columns:
      - name: customer_id
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
        data_type: STRING
      - name: last_name
        description: ''
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
        data_type: STRING
      - name: first_order_date
        description: ''
        data_type: DATE
      - name: most_recent_order_date
        description: ''
        data_type: DATE
      - name: number_of_orders
        description: ''
        data_type: INT64
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: DATE
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
      - name: status
        description: ''
        data_type: STRING
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders

試行2|3重以上のカラム継承がある場合

:one: 前準備の状態で次のモデルを追加します:

models/customers_depth_2.sql
select
    first_name
from
    {{ ref('customers') }}
models/customers_depth_3.sql
select
    first_name
from
    {{ ref('customers_depth_2') }}

追加後のデータリネージは次のようになります:

:two: dbt-osmosis yaml refactor --add-progenitor-to-meta を実行します(1回目)

実行結果:

  • customers 以降は meta: が付与されている
:pencil: 実際の _schema.yml
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: INT64
      - name: last_name
        description: ''
        data_type: INT64
  - name: customers_depth_3
    columns:
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
  - name: customers_depth_2
    columns:
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
  - name: customers
    columns:
      - name: customer_id
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
      - name: last_name
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
      - name: first_order_date
        description: ''
        data_type: INT64
      - name: most_recent_order_date
        description: ''
        data_type: INT64
      - name: number_of_orders
        description: ''
        data_type: INT64
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: INT64
      - name: status
        description: ''
        data_type: INT64

:three: もう1回 dbt-osmosis yaml refactor --add-progenitor-to-meta を実行します

実行結果:

  • 全モデル meta: が付与されている

Source から2層目だけちょっと処理が違うのかな…?(要検証)

:pencil: 実際の _schema.yml
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: ''
        data_type: INT64
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
      - name: last_name
        description: ''
        data_type: INT64
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
  - name: customers_depth_3
    columns:
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
        data_type: INT64
  - name: customers_depth_2
    columns:
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
        data_type: INT64
  - name: customers
    columns:
      - name: customer_id
        description: ''
        meta:
          osmosis_progenitor: model.my_dbt_project.stg_customers
        data_type: INT64
      - name: first_name
        description: ''
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
        data_type: INT64
      - name: last_name
        description: ''
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_customers
        data_type: INT64
      - name: first_order_date
        description: ''
        data_type: INT64
      - name: most_recent_order_date
        description: ''
        data_type: INT64
      - name: number_of_orders
        description: ''
        data_type: INT64
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: INT64
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders
      - name: status
        description: ''
        data_type: INT64
        meta:
          osmosis_progenitor: source.my_dbt_project.YOUR_GC_DATASET.raw_orders

--use-unrendered-descriptions

概要

形式

--use-unrendered-descriptions

機能

デフォルト挙動では、 description: の中身はレンダリングされます。

description: "{{ doc(\"first_name\") }}"

description: "### about\n\nfirst_name"

このオプションを指定することでレンダリング前の中身をそのまま流用するようにできます。

description: "{{ doc(\"first_name\") }}"

description: "{{ doc(\"first_name\") }}"

前準備

次のサンプルリポジトリをベースに配置していくことにします。あらかじめ一度 dbt run コマンドを実行し、 BigQuery 上にテーブルを作っておいてください。

上記の前提に基づいて、前準備の手順を下記に示します。

:one: 上記リポジトリに次の変更を加える

  • _source.yml を次の内容へ変更します:
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: first_name
            description: |
              {{ doc("first_name") }}
      - name: raw_orders
  • models/_doc.md を次の内容で作成します:
_doc.md
{% docs first_name %}

### about

first_name

{% enddocs %}

試行1|そのまま dbt-osmosis yaml refactor を実行

実行結果:

  • _source.yml では _doc.md の中身が展開されていない
  • _schema.yml では _doc.md の中身が展開されている
      - name: first_name
        description: '### about


          first_name'
:pencil: 実際の _source.yml と _schema.yml
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: first_name
            description: |
              {{ doc("first_name") }}
            data_type: INT64
          - name: last_name
            description: ''
            data_type: INT64
      - name: raw_orders
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: user_id
            description: ''
            data_type: INT64
          - name: order_date
            description: ''
            data_type: INT64
          - name: status
            description: ''
            data_type: INT64
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: '### about


          first_name'
        data_type: INT64
      - name: last_name
        description: ''
        data_type: INT64
  - name: customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: '### about


          first_name'
        data_type: INT64
      - name: last_name
        description: ''
        data_type: INT64
      - name: first_order_date
        description: ''
        data_type: INT64
      - name: most_recent_order_date
        description: ''
        data_type: INT64
      - name: number_of_orders
        description: ''
        data_type: INT64
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: INT64
      - name: status
        description: ''
        data_type: INT64

試行2| --use-unrendered-descriptions オプションをつけて実行

実行コマンド:

dbt-osmosis yaml refactor --use-unrendered-descriptions

実行結果:

  • _doc.md の中身が展開されていない
      - name: first_name
        description: '{{ doc("first_name") }}

          '
:pencil: 実際の _source.yml と _schema.yml
_source.yml
version: 2

sources:
  - name: YOUR_GC_DATASET
    database: YOUR_GC_PROJECT_ID
    tables:
      - name: raw_customers
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: first_name
            description: |
              {{ doc("first_name") }}
            data_type: INT64
          - name: last_name
            description: ''
            data_type: INT64
      - name: raw_orders
        columns:
          - name: id
            description: ''
            data_type: INT64
          - name: user_id
            description: ''
            data_type: INT64
          - name: order_date
            description: ''
            data_type: INT64
          - name: status
            description: ''
            data_type: INT64
_schema.yml
version: 2
models:
  - name: stg_customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: '{{ doc("first_name") }}

          '
        data_type: INT64
      - name: last_name
        description: ''
        data_type: INT64
  - name: stg_orders
    columns:
      - name: order_id
        description: ''
        data_type: INT64
      - name: customer_id
        description: ''
        data_type: INT64
      - name: order_date
        description: ''
        data_type: INT64
      - name: status
        description: ''
        data_type: INT64
  - name: customers
    columns:
      - name: customer_id
        description: ''
        data_type: INT64
      - name: first_name
        description: '{{ doc("first_name") }}

          '
        data_type: INT64
      - name: last_name
        description: ''
        data_type: INT64
      - name: first_order_date
        description: ''
        data_type: INT64
      - name: most_recent_order_date
        description: ''
        data_type: INT64
      - name: number_of_orders
        description: ''
        data_type: INT64

結び

この記事では dbt-osmosis の Refactor 機能について、いくつかのオプションの挙動をまとめました。記事に誤りなどありましたらご指摘いただけますと幸いです :bow:

ここまでお読みいただきありがとうございました。

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?