こちらの記事の続きになります。
検証の流れ
- 環境準備
- dbtプロジェクトの作成
- dbtのモデル作成と実行
- より複雑なモデルの作成と実行 ←この記事はここから
- dbtのテスト実行
4. より複雑なモデル作成と実行
1. Databricks SQLでのテーブル準備
まずは、Databricks SQLでテーブルとデータを準備します。
DROP TABLE IF EXISTS zzz_game_opponents;
DROP TABLE IF EXISTS zzz_game_scores;
DROP TABLE IF EXISTS zzz_games;
DROP TABLE IF EXISTS zzz_teams;
CREATE TABLE zzz_game_opponents (
game_id INT,
home_team_id INT,
visitor_team_id INT
) USING DELTA
;
INSERT INTO zzz_game_opponents
VALUES
(1, 1, 2),
(2, 1, 3),
(3, 2, 1),
(4, 2, 3),
(5, 3, 1),
(6, 3, 2)
;
CREATE TABLE zzz_game_scores (
game_id INT,
home_team_score INT,
visitor_team_score INT
) USING DELTA
;
INSERT INTO zzz_game_scores
VALUES
(1, 4, 2),
(2, 0, 1),
(3, 1, 2),
(4, 3, 2),
(5, 3, 0),
(6, 3, 1)
;
CREATE TABLE zzz_games (
game_id INT,
game_date DATE
) USING DELTA
;
INSERT INTO zzz_games
VALUES
(1, '2020-12-12'),
(2, '2021-01-09'),
(3, '2020-12-19'),
(4, '2021-01-16'),
(5, '2021-01-23'),
(6, '2021-02-06')
;
CREATE TABLE zzz_teams (
team_id INT,
team_city VARCHAR(15)
) USING DELTA
;
INSERT INTO zzz_teams
VALUES
(1, "San Francisco"),
(2, "Seattle"),
(3, "Amsterdam")
;
2. dbtのモデル作成
modelsディレクトリ配下に以下のようなzzz_game_details.sqlを作成します。
-- 各試合詳細情報を提供するテーブルを作成する
-- スキーマは以下の通り
-- game_id int -- ゲームID
-- home varchar(15) -- ホームチーム
-- visitor varchar(15) -- アウェイチーム
-- home_score int -- ホームチームの点数
-- visitor_score int -- アウェイチームの点数
-- winner varchar(15) -- 勝利チーム
-- date date -- 試合日
{{ config(
materialized='table',
file_format='delta'
) }}
with all_ids as (
-- Step 1 of 4: Combine data from various tables (for example, game and team IDs, scores, dates).
select
go.home_team_id,
g.game_id,
gs.home_team_score as home_score,
go.visitor_team_id,
gs.visitor_team_score as visitor_score,
g.game_date
from
zzz_games as g,
zzz_game_opponents as go,
zzz_game_scores as gs
where
g.game_id = go.game_id and
g.game_id = gs.game_id
), visitor_ids as (
-- Step 2 of 4: Replace the home team IDs with their actual city names.
select
game_id,
t.team_city as home,
home_score,
visitor_team_id,
visitor_score,
game_date
from
all_ids,
zzz_teams as t
)
select
game_id,
home,
t.team_city as visitor, -- Step 4 of 4: Replace the visitor team IDs with their city names.
home_score,
visitor_score,
-- Step 3 of 4: Display the city name for each game's winner.
case
when
home_score > visitor_score
then
home
when
visitor_score > home_score
then
t.team_city
end as winner,
game_date as date
from
visitor_ids,
zzz_teams as t
where
visitor_ids.visitor_team_id = t.team_id
order by game_date desc
;
同様に、zzz_win_loss_records.sqlも作成します。
-- チームごとの勝敗数サマリのビューを作成します。
with sub as (
-- Step 1 of 2: Determine the winner and loser for each game.
select
game_id,
winner,
case
when
home = winner
then
visitor
else
home
end as loser
from {{ ref('zzz_game_details') }}
)
-- Step 2 of 2: Calculate the number of wins and losses for each team.
select
winner as team,
count(winner) as wins,
-- Each team played in 4 games.
(4 - count(winner)) as losses
from
sub
group by winner
order by wins desc
以下dbtコマンドを実行してテーブルとビューのマテリアライズします。
dbt run --model models/zzz_game_details.sql models/zzz_win_loss_records.sql
00:00:00 Running with dbt=1.0.0
00:00:00 Found 7 models, 4 tests, 0 snapshots, 0 analyses, 190 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0
metrics
00:00:00
00:00:00 Concurrency: 1 threads (target='dev')
00:00:00
00:00:00 1 of 2 START table model dbt_schema.zzz_game_details............................ [RUN]
00:00:00 1 of 2 OK created table model dbt_schema.zzz_game_details....................... [OK in 5.81s]
00:00:00 2 of 2 START view model dbt_schema.zzz_win_loss_records......................... [RUN]
00:00:00 2 of 2 OK created view model dbt_schema.zzz_win_loss_records.................... [OK in 1.47s]
00:00:00
00:00:00 Finished running 1 table model, 1 view model in 14.66s.
00:00:00
00:00:00 Completed successfully
00:00:00
00:00:00 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
5. dbtのテスト実行
dbtのテストは、singular testsという定型の(パラメータを含まない)SELECT文によるワンオフなテストと、
generic testsと呼ばれるパラメータ化されたクエリによるテストの2種類があります。
(generic testsの詳細例)
singular testsは、testsディレクトリ配下にSQL文を配置して定義します。
generic testsは、test
ブロックでパラメータされたSQL文を、tests/generic
かmacros
ディレクトリに配置して
さらに、modelsディレクトリ等にYAMLファイルで呼び出して利用します。
いずれのテストでもSELECT文の結果が0件であれば合格(pass)と判定します。
generic testsの定義例
{% test not_null(model, column_name) %}
select *
from {{ model }}
where {{ column_name }} is null
{% endtest %}
また、generic testsには、以下4つの組み込みテストがあり、すぐ利用できるようになっています。
- unique: 重複する値が存在しない
- not_null: NULL値が存在しない
- accepted_value: リストで提供される値のみ取る
- relationships: 参照先テーブルの参照カラムに値が存在する(参照整合性)
1. generic tests
以下schema.ymlをmodelsディレクトリ配下に配置します。
testsの項目にgeneric testsの名前を指定することで呼び出しを行えます。
version: 2
models:
- name: zzz_game_details
columns:
- name: game_id
tests:
- unique
- not_null
- name: home
tests:
- not_null
- accepted_values:
values: ['Amsterdam', 'San Francisco', 'Seattle']
- name: visitor
tests:
- not_null
- accepted_values:
values: ['Amsterdam', 'San Francisco', 'Seattle']
- name: home_score
tests:
- not_null
- name: visitor_score
tests:
- not_null
- name: winner
tests:
- not_null
- accepted_values:
values: ['Amsterdam', 'San Francisco', 'Seattle']
- name: date
tests:
- not_null
- name: zzz_win_loss_records
columns:
- name: team
tests:
- unique
- not_null
- relationships:
to: ref('zzz_game_details')
field: home
- name: wins
tests:
- not_null
- name: losses
tests:
- not_null
2. singular tests
testsディレクトリにテスト用のSQLファイルを配置します。
zzz_game_details_check_dates.sql
日付の範囲が正しいかチェックする
-- This season's games happened between 2020-12-12 and 2021-02-06.
-- For this test to pass, this query must return no results.
select date
from {{ ref('zzz_game_details') }}
where date < '2020-12-12'
or date > '2021-02-06'
zzz_game_details_check_scores.sql
点数が負値を取ることがないことをチェックする
-- This sport allows no negative scores or tie games.
-- For this test to pass, this query must return no results.
select home_score, visitor_score
from {{ ref('zzz_game_details') }}
where home_score < 0
or visitor_score < 0
or home_score = visitor_score
zzz_game_details_check_records.sql
勝敗数は0-4の範囲であることをチェックする
-- Each team participated in 4 games this season.
-- For this test to pass, this query must return no results.
select wins, losses
from {{ ref('zzz_win_loss_records') }}
where wins < 0 or wins > 4
or losses < 0 or losses > 4
or (wins + losses) > 4
3. dbt test
コマンド
dbt test
コマンドでテストを実行します。
何もオプションを指定しない場合は、すべてのテストを実行しますが、--select
オプションなどでテスト対象をフィルタすることも可能です。
モデルに「zzz_win_loss_records」を指定、テストタイプはsingular testsのみを指定
(zzz_game_details_check_records.sqlのみ実行される)
dbt test --select zzz_win_loss_records,test_type:singular
00:00:00 Running with dbt=1.0.0
00:00:00 Found 7 models, 23 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:00:00
00:00:00 Concurrency: 1 threads (target='dev')
00:00:00
00:00:00 1 of 1 START test zzz_game_detals_check_records................................. [RUN]
00:00:00 1 of 1 FAIL 2 zzz_game_detals_check_records..................................... [FAIL 2 in 1.72s]
00:00:00
00:00:00 Finished running 1 test in 5.84s.
00:00:00
00:00:00 Completed with 1 error and 0 warnings:
00:00:00
00:00:00 Failure in test zzz_game_detals_check_records (tests\zzz_game_detals_check_records.sql)
00:00:00 Got 2 results, configured to fail if != 0
00:00:00
00:00:00 compiled SQL at target\compiled\dbt_demo_project\tests\zzz_game_detals_check_records.sql
00:00:00
00:00:00 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
zzz_win_loss_recordsビューは以下のような結果を返すので、lossesが0未満の値を含むので失敗します。
次にzzz_game_detailsテーブルのgeneric testsを実行してみます。
以下コマンドでzzz_win_loss_recordsモデルの参照整合性テストを含む12のgeneric testsが実行されます。
dbt test --select zzz_game_details,test_type:generic
00:00:00 Running with dbt=1.0.0
00:00:00 Found 7 models, 23 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:00:00
00:00:00 Concurrency: 1 threads (target='dev')
00:00:00
00:00:00 1 of 12 START test accepted_values_zzz_game_details_home__Amsterdam__San_Francisco__Seattle [RUN]
00:00:00 1 of 12 PASS accepted_values_zzz_game_details_home__Amsterdam__San_Francisco__Seattle [PASS in 1.15s]
00:00:00 2 of 12 START test accepted_values_zzz_game_details_visitor__Amsterdam__San_Francisco__Seattle [RUN]
00:00:00 2 of 12 PASS accepted_values_zzz_game_details_visitor__Amsterdam__San_Francisco__Seattle [PASS in 1.15s]
00:00:00 3 of 12 START test accepted_values_zzz_game_details_winner__Amsterdam__San_Francisco__Seattle [RUN]
00:00:00 3 of 12 PASS accepted_values_zzz_game_details_winner__Amsterdam__San_Francisco__Seattle [PASS in 1.03s]
00:00:00 4 of 12 START test not_null_zzz_game_details_date............................... [RUN]
00:00:00 4 of 12 PASS not_null_zzz_game_details_date..................................... [PASS in 1.03s]
00:00:00 5 of 12 START test not_null_zzz_game_details_game_id............................ [RUN]
00:00:00 5 of 12 PASS not_null_zzz_game_details_game_id.................................. [PASS in 0.99s]
00:00:00 6 of 12 START test not_null_zzz_game_details_home............................... [RUN]
00:00:00 6 of 12 PASS not_null_zzz_game_details_home..................................... [PASS in 1.01s]
00:00:00 7 of 12 START test not_null_zzz_game_details_home_score......................... [RUN]
00:00:00 7 of 12 PASS not_null_zzz_game_details_home_score............................... [PASS in 1.01s]
00:00:00 8 of 12 START test not_null_zzz_game_details_visitor............................ [RUN]
00:00:00 8 of 12 PASS not_null_zzz_game_details_visitor.................................. [PASS in 0.99s]
00:00:00 9 of 12 START test not_null_zzz_game_details_visitor_score...................... [RUN]
00:00:00 9 of 12 PASS not_null_zzz_game_details_visitor_score............................ [PASS in 1.07s]
00:00:00 10 of 12 START test not_null_zzz_game_details_winner............................ [RUN]
00:00:00 10 of 12 PASS not_null_zzz_game_details_winner.................................. [PASS in 2.20s]
00:00:00 11 of 12 START test relationships_zzz_win_loss_records_team__home__ref_zzz_game_details_ [RUN]
00:00:00 11 of 12 PASS relationships_zzz_win_loss_records_team__home__ref_zzz_game_details_ [PASS in 1.02s]
00:00:00 12 of 12 START test unique_zzz_game_details_game_id............................. [RUN]
00:00:00 12 of 12 FAIL 6 unique_zzz_game_details_game_id................................. [FAIL 6 in 1.02s]
00:00:00
00:00:00 Finished running 12 tests in 17.01s.
00:00:00
00:00:00 Completed with 1 error and 0 warnings:
00:00:00
00:00:00 Failure in test unique_zzz_game_details_game_id (models\schema.yml)
00:00:00 Got 6 results, configured to fail if != 0
00:00:00
00:00:00 compiled SQL at target\compiled\dbt_demo_project\models\schema.yml\unique_zzz_game_details_game_id.sql
00:00:00
00:00:00 Done. PASS=11 WARN=0 ERROR=1 SKIP=0 TOTAL=12