dbt Cloudのはじめ方~Snowflake接続からジョブ実行まで~

Last updated at 2024-12-06Posted at 2024-12-06

はじめに

dbt Cloudを試してみたいと思われた方向けの記事です。
今すぐdbtを始められるよう必要最低限の準備についてまとめました。
この記事を読めばSnowflakeのデータを使ってジョブ実行まで実現できます。

アカウント作成

公式ウェブサイトの画面右上にある「Try dbt Cloud」を選択

招待メールを確認しログイン

プロジェクトのセットアップ画面に映るのでコネクションの設定から始める

接続方法を選択

Connectionより「Snowflake」を選択

Snowflake側の事前準備

・トライアルアカウントの作成
　→SnowflakeのUIに移動して初期設定を行います。
・必要な資材の準備
　→クイックスタートを参考にサンプルデータの取り込みまで行います。
　データベースはSnowflakeの設計思想に基づきRAW/OPEN/MARTの3つを作成します。
　RAWは生データ、OPENは使いやすいように整形したデータ、MARTはBI/MLなど特定のアウトプットに特化したデータを格納します。dbtのモデルではOPENとMARTを作成します。

.sql

--初期準備
create warehouse transforming;
create database raw;
create database open;
create database mart;
create schema raw.jaffle_shop;
create schema raw.stripe;

--サンプルデータ用テーブル作成、データ取り込み
create table raw.jaffle_shop.customers 
( id integer,
  first_name varchar,
  last_name varchar
);

copy into raw.jaffle_shop.customers (id, first_name, last_name)
from 's3://dbt-tutorial-public/jaffle_shop_customers.csv'
file_format = (
    type = 'CSV'
    field_delimiter = ','
    skip_header = 1
    ); 

create table raw.jaffle_shop.orders
( id integer,
  user_id integer,
  order_date date,
  status varchar,
  _etl_loaded_at timestamp default current_timestamp
);

copy into raw.jaffle_shop.orders (id, user_id, order_date, status)
from 's3://dbt-tutorial-public/jaffle_shop_orders.csv'
file_format = (
    type = 'CSV'
    field_delimiter = ','
    skip_header = 1
    );

create table raw.stripe.payment 
( id integer,
  orderid integer,
  paymentmethod varchar,
  status varchar,
  amount integer,
  created date,
  _batched_at timestamp default current_timestamp
);

copy into raw.stripe.payment (id, orderid, paymentmethod, status, amount, created)
from 's3://dbt-tutorial-public/stripe_payments.csv'
file_format = (
    type = 'CSV'
    field_delimiter = ','
    skip_header = 1
    );

--取り込み確認
select * from raw.jaffle_shop.customers;
select * from raw.jaffle_shop.orders;
select * from raw.stripe.payment;

--以下は必要に応じて実行してください
--アカウントのタイムゾーンを変更する
alter account set timezone = 'Asia/Tokyo';

dbtに戻りコネクションの設定を続ける

アカウント識別子について
　鎖マークを選択しアカウントURLをコピーしXXXXXXXXの部分を切り取り。
　https://XXXXXXXX.snowflakecomputing.com

別タブで開いているプロジェクトのセットアップページに移動

Connectionには先ほど作成したSnowflakeのコネクションを選択
Snowflakeのログイン情報を入力

指定したスキーマにdbtモデルが作成されます。
デフォルトでは「dbt_名前の頭文字と苗字」が付けられます。

Snowflakeの接続テスト

設定後、接続テストまでやっておきましょう。

リポジトリ設定

検証などの用途であれば「managed」で十分です。

初期設定は完了

モデル作成

「Develop」から開発者画面に遷移。「Click here to initialize your project」を選択します。
「models」フォルダ配下にSQLモデルや設定ファイルを作成します。

[補足]クイックスタート後の状態を回復したい方向け

※クイックスタート後の状態を用意したい場合に使えそうなので記録。
※materializedやfreshnessについて追記あり
※models配下のexamleフォルダは削除する

sources.yml

version: 2

sources:
    - name: jaffle_shop
      description: This is a replica of the Postgres database used by our app
      database: raw
      schema: jaffle_shop
      tables:
          - name: customers
            description: One record per customer.
          - name: orders
            loaded_at_field: _etl_loaded_at
            freshness: 
              error_after: {count: 5, period: minute}
            description: One record per order. Includes cancelled and deleted orders.

stg_customer.sql

{{
    config(
        materialized='table'
    )
}}
select
    id as customer_id,
    first_name,
    last_name

from {{ source('jaffle_shop', 'customers') }}

stg_orders.sql

{{
    config(
        materialized='table'
    )
}}
select
    id as order_id,
    user_id as customer_id,
    order_date,
    status

from {{ source('jaffle_shop', 'orders') }}

customers.sql

{{
    config(
        materialized='table'
    )
}}

with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

customer_orders as (

    select
        customer_id,

        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date,
        count(order_id) as number_of_orders

    from orders

    group by 1

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order_date,
        customer_orders.most_recent_order_date,
        coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

)

select * from final