本文記載時点はPipeRider 0.4.1を参考にしています。
はじめに
PipeRiderはdbtプロジェクトのトランスフォーメーションされたデータモデルをプロファイリングし、dbtのアサーションとPipeRiderのアサーション実行し、結果によってリポートを生成する
リクワイアメント
範例SQLite Databaseの用意する
curl -o sp500.db https://piperider-data.s3.ap-northeast-1.amazonaws.com/getting-started/sp500_20220401.db
pip install dbt-sqlite
dbt-sqlite
用のcrypo library fileを用意する
crypo library fileをダウンロードする
Mac Intelの場合はcrypto.dylib
Mac AppleSiliconの場合はcrypto.arm64.dylib
dbtプロジェクトを初期化する
dbt init
Running with dbt=1.1.1
Enter a name for your project (letters, digits, underscore): sp500_index
Which database would you like to use?
[1] sqlite
(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)
Enter a number: 1
~/.dbt/profiles.yml
を編集する
extentionsのパス
crypto.arm64.dylib
crypto.arm64
を入力してください
sp500_index:
outputs:
dev:
type: sqlite
threads: 1
database: sp500
schema: 'main'
schemas_and_paths:
main: '/example/path/to/qiita/sp500.db'
schema_directory: '/example/path/to/qiita/'
extensions:
- '/example/path/to/qiita/crypto.arm64'
target: dev
dbtプロジェクトの設定を確認する
cd sp500_index
dbt debug --profile sp500_index
Running with dbt=1.1.1
dbt version: 1.1.1
python version: 3.9.12
python path: /example/path/to/qiita/pyenv/bin/python3.9
os info: macOS-12.3-arm64-arm-64bit
Using profiles.yml file at /example/path/to/.dbt/profiles.yml
Using dbt_project.yml file at /example/path/to/qiita/sp500_index/dbt_project.yml
Configuration:
profiles.yml file [OK found and valid]
dbt_project.yml file [OK found and valid]
Required dependencies:
- git [OK found]
Connection:
database: sp500
schema: main
schemas_and_paths: {'main': '/example/path/to/qiita/sp500.db'}
schema_directory: /example/path/to/qiita/
Connection test: [OK connection ok]
All checks passed!
dbtでトランスフォーメーション
データモデルにデータをトランスフォーメーションする
第一モデルmodels/example/my_first_dbt_model.sql
を編集する
with source_data as (
SELECT * FROM PRICE
LEFT JOIN SYMBOL ON SYMBOL.SYMBOL = PRICE.SYMBOL
WHERE PRICE.DATE = '2022-01-04'
)
SELECT *
FROM source_data
第一モデルにより、第二モデルmodels/example/my_second_dbt_model.sql
を編集する
SELECT SYMBOL, DATE, OPEN, HIGH, LOW, CLOSE, VOLUME, ADJCLOSE, MA5, MA20,MA60, NAME, COUNTRY, EXCHANGE_CODE, SECTOR, INDUSTRY
FROM {{ ref('my_first_dbt_model') }}
WHERE COUNTRY = 'United States'
データ審査のルールmodels/example/schema.yml
を編集する
version: 2
models:
- name: my_first_dbt_model
description: "Record Date on 2022-01-04"
columns:
- name: SYMBOL
description: "Stock symbol"
tests:
- unique
- not_null
- name: my_second_dbt_model
description: "Stocks in the U.S."
columns:
- name: OPEN
description: "Opening price"
tests:
- not_null
データモデルを生成する
dbt build
Output 範例
14:45:58 Running with dbt=1.1.1
14:45:58 Found 2 models, 3 tests, 0 snapshots, 0 analyses, 173 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
14:45:58
14:45:58 Concurrency: 1 threads (target='dev')
14:45:58
14:45:58 1 of 5 START view model main.my_first_dbt_model ................................ [RUN]
14:45:58 1 of 5 OK created view model main.my_first_dbt_model ........................... [OK in 0.07s]
14:45:58 2 of 5 START test not_null_my_first_dbt_model_SYMBOL ........................... [RUN]
14:45:58 2 of 5 PASS not_null_my_first_dbt_model_SYMBOL ................................. [PASS in 0.07s]
14:45:58 3 of 5 START test unique_my_first_dbt_model_SYMBOL ............................. [RUN]
14:45:58 3 of 5 PASS unique_my_first_dbt_model_SYMBOL ................................... [PASS in 0.04s]
14:45:58 4 of 5 START view model main.my_second_dbt_model ............................... [RUN]
14:45:58 4 of 5 OK created view model main.my_second_dbt_model .......................... [OK in 0.02s]
14:45:58 5 of 5 START test not_null_my_second_dbt_model_OPEN ............................ [RUN]
14:45:58 5 of 5 PASS not_null_my_second_dbt_model_OPEN .................................. [PASS in 0.04s]
14:45:58
14:45:58 Finished running 2 view models, 3 tests in 0.40s.
14:45:58
14:45:58 Completed successfully
14:45:58
14:45:58 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5
データモデルをテストする
dbt test
Output 範例
14:46:53 Running with dbt=1.1.1
14:46:53 Found 2 models, 3 tests, 0 snapshots, 0 analyses, 173 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
14:46:53
14:46:53 Concurrency: 1 threads (target='dev')
14:46:53
14:46:53 1 of 3 START test not_null_my_first_dbt_model_SYMBOL ........................... [RUN]
14:46:53 1 of 3 PASS not_null_my_first_dbt_model_SYMBOL ................................. [PASS in 0.07s]
14:46:53 2 of 3 START test not_null_my_second_dbt_model_OPEN ............................ [RUN]
14:46:53 2 of 3 PASS not_null_my_second_dbt_model_OPEN .................................. [PASS in 0.07s]
14:46:53 3 of 3 START test unique_my_first_dbt_model_SYMBOL ............................. [RUN]
14:46:53 3 of 3 PASS unique_my_first_dbt_model_SYMBOL ................................... [PASS in 0.04s]
14:46:53
14:46:53 Finished running 3 tests in 0.28s.
14:46:53
14:46:53 Completed successfully
14:46:53
14:46:53 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
PipeRiderの参入
pip install piperider
PipeRiderは自動的にdbtのdbt_project.yml
を読み込む
piperider init
Initialize piperider to path /example/path/to/qiita/sp500_index/.piperider
Start to search dbt project ...
Use dbt project file: /example/path/to/qiita/sp500_index/dbt_project.yml
─────────────────────────────────────────────────────────────────────────────────────────── .piperider/config.yml ────────────────────────────────────────────────────────────────────────────────────────────
1 dataSources:
2 - name: sp500_index
3 type: sqlite
4 dbt:
5 profile: sp500_index
6 target: dev
7 projectDir: .
8 telemetry:
9 id: 181f0172e54546c99ea971d0a7b913e2
Next step:
Please execute command 'piperider diagnose' to verify configuration
PipeRiderの設定を確認する
piperider diagnose
Diagnosing...
PipeRider Version: 0.4.1
Check config files:
/example/path/to/qiita/sp500_index/.piperider/config.yml: [OK]
✅ PASS
Check format of data sources:
sp500_index: [OK]
✅ PASS
Check connections:
DBT: sqlite > sp500_index > dev
Name: sp500_index
Type: sqlite
Available Tables: ['ACTION', 'PRICE', 'SYMBOL']
Connection: [OK]
✅ PASS
Check assertion files:
/example/path/to/qiita/sp500_index/.piperider/assertions/my_second_dbt_model.yml: [OK]
/example/path/to/qiita/sp500_index/.piperider/assertions/my_first_dbt_model.yml: [OK]
✅ PASS
🎉 You are all set!
Next step:
Please execute command 'piperider run' to generate your second report
dbtの生成されたデータモデルをプロファイングし、、dbtのアサーションとPipeRiderのアサーション実行し、結果によってリポートを生成す
piperider run --dbt-test
Output 範例
DataSource: sp500_index
──────────────────────────────────────────────────────────────────────────────────────────────────── Validating ────────────────────────────────────────────────────────────────────────────────────────────────────
everything is OK.
─────────────────────────────────────────────────────────────────────────────────────────────────── Running dbt ────────────────────────────────────────────────────────────────────────────────────────────────────
dbt working dir: .
──────────────────────────────────────────────────────────────────────────────────────────────────── Profiling ─────────────────────────────────────────────────────────────────────────────────────────────────────
fetching metadata for table 'my_first_dbt_model'
fetching metadata for table 'my_second_dbt_model'
profiling [my_first_dbt_model.SYMBOL] type=VARCHAR(16777216)
profiling [my_first_dbt_model.DATE] type=DATE
profiling [my_first_dbt_model.OPEN] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.HIGH] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.LOW] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.CLOSE] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.VOLUME] type=NUMERIC(38, 0)
profiling [my_first_dbt_model.ADJCLOSE] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.MA5] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.MA20] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.MA60] type=NUMERIC(10, 2)
profiling [my_first_dbt_model.SYMBOL:1] type=VARCHAR(16777216)
profiling [my_first_dbt_model.NAME] type=VARCHAR(16777216)
profiling [my_first_dbt_model.START_DATE] type=DATE
profiling [my_first_dbt_model.END_DATE] type=DATE
profiling [my_first_dbt_model.DESCRIPTION] type=VARCHAR(16777216)
profiling [my_first_dbt_model.EXCHANGE_CODE] type=VARCHAR(16777216)
profiling [my_first_dbt_model.MARKET] type=VARCHAR(16777216)
profiling [my_first_dbt_model.COUNTRY] type=VARCHAR(16777216)
profiling [my_first_dbt_model.SECTOR] type=VARCHAR(16777216)
profiling [my_first_dbt_model.INDUSTRY] type=VARCHAR(16777216)
profiling [my_first_dbt_model.RECOMMENDATION_KEY] type=VARCHAR(16777216)
profiling [my_second_dbt_model.SYMBOL] type=VARCHAR(16777216)
profiling [my_second_dbt_model.DATE] type=DATE
profiling [my_second_dbt_model.OPEN] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.HIGH] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.LOW] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.CLOSE] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.VOLUME] type=NUMERIC(38, 0)
profiling [my_second_dbt_model.ADJCLOSE] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.MA5] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.MA20] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.MA60] type=NUMERIC(10, 2)
profiling [my_second_dbt_model.NAME] type=VARCHAR(16777216)
profiling [my_second_dbt_model.COUNTRY] type=VARCHAR(16777216)
profiling [my_second_dbt_model.EXCHANGE_CODE] type=VARCHAR(16777216)
profiling [my_second_dbt_model.SECTOR] type=VARCHAR(16777216)
profiling [my_second_dbt_model.INDUSTRY] type=VARCHAR(16777216)
PipeRiderから自動生成のアサーションを作成するか?
noを入力してください、もし試したいからyesを入力してください
No assertion found
Do you want to auto generate recommended assertions for this datasource [Yes/no]? no
Output 範例
────────────────────────────────────────────────────────────────────────────────────────── Generating Assertion Templates ──────────────────────────────────────────────────────────────────────────────────────────
Template Assertion: /example/path/to/qiita/sp500_index/.piperider/assertions/my_second_dbt_model.yml
Template Assertion: /example/path/to/qiita/sp500_index/.piperider/assertions/my_first_dbt_model.yml
──────────────────────────────────────────────────────────────────────────────────────────────── Assertion Results ─────────────────────────────────────────────────────────────────────────────────────────────────
─────────────────────────────────────────────────────────────────────────────────────────────────────── dbt ────────────────────────────────────────────────────────────────────────────────────────────────────────
[ OK ] my_first_dbt_model.SYMBOL test.sp500_index.not_null_my_first_dbt_model_SYMBOL.06ca293b81 Message: None
[ OK ] my_first_dbt_model.SYMBOL test.sp500_index.unique_my_first_dbt_model_SYMBOL.0463684cd6 Message: None
[ OK ] my_second_dbt_model.OPEN test.sp500_index.not_null_my_second_dbt_model_OPEN.fa199efe22 Message: None
───────────────────────────────────────────────────────────────────────────────────────────────────── Summary ──────────────────────────────────────────────────────────────────────────────────────────────────────
Table 'my_second_dbt_model'
16 columns profiled
1 dbt test executed
Table 'my_first_dbt_model'
22 columns profiled
2 dbt test executed
Generating reports from: /example/path/to/qiita/sp500_index/.piperider/outputs/latest/run.json
Report generated in /example/path/to/qiita/sp500_index/.piperider/outputs/latest/index.html
Next step:
Please execute command 'piperider run' to generate your second report
Output 範例 (ブラウザで/example/path/to/qiita/sp500_index/.piperider/outputs/latest/index.html
リポートを見る)
以上です
他の参考