Snowflake SnowPro Advanced: Data Engineer について

Snowflake

Posted at 2025-01-01

これは何？

Snowflakeの認定資格「Snowflake SnowPro Advanced: Data Engineer」について、受講に向けた情報をまとめてみた。

公式情報

参考 : https://learn.snowflake.com/en/certifications/snowpro-advanced-dataengineer/

概要

EXAM FORMAT

Exam Version: DEA-C01
Total Number of Questions: 65
Question Types: Multiple Select, Multiple Choice
Time Limit: 115 minutes
Language: English
Registration fee: $375 USD
Passing Score: 750 + Scaled Scoring from 0 - 1000
Unscored Content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score, and additional time is factored into account for this content.
Prerequisites: SnowPro Core Certified

試験形式

試験バージョン DEA-C01
総問題数 65
質問のタイプ 複数選択、多肢選択
制限時間: 115分
言語 英語
登録料： $375 USD
合格スコア: 750点 + 0～1000点満点
未採点の内容： 試験には、将来使用する統計情報を収集するために採点されない項目が含まれることがあります。これらの項目はフォームには記載されず、スコアに影響しません。
前提条件：SnowPro コア認定

EXAM DOMAIN BREAKDOWN

This exam guide includes test domains, weightings, and objectives. It is not a comprehensive listing of all the content that will be presented on this examination. The table below lists the main content domains and their weightings.

Domain Weightings on Exams
1.0 Data Movement	25-30%
2.0 Performance Optimization	20-25%
3.0 Storage and Data Protection	10-15%
4.0 Security	10-15%
5.0 Data Transformation	25-30%

EXAM TOPICS

Outlined below are the Domains & Objectives measured on the exam. To view subtopics, download the exam study guide.

試験範囲の内訳

この試験ガイドには、試験範囲、配点、出題目標が記載されています。この試験で出題されるすべての内容を網羅しているわけではありません。以下の表は、主な出題範囲とそのウエイトを示したものです。

試験におけるドメインの重み付け
1.0 データ移動 25-30%
2.0 パフォーマンスの最適化 20-25%
3.0 ストレージとデータ保護 10-15%
4.0 セキュリティ 10-15%
5.0 データ変換 25-30%

試験のトピック

以下に示すのは、試験で測定されるドメインと目的です。サブトピックを見るには、スタディガイドをダウンロードしてください。

内訳

Domain 1.0: Data Movement

1.1 Given a data set, load data into Snowflake.
1.2 Ingest data of various formats through the mechanics of Snowflake.
1.3 Troubleshoot data ingestion.
1.4 Design, build, and troubleshoot continuous data pipelines.
1.5 Analyze and differentiate types of data pipelines.
1.6 Install, configure, and use connectors to connect to Snowflake.
1.7 Design and build data sharing solutions.
1.8 Outline when to use external tables and define how they work.

ドメイン1.0： データの移動

1.1 データセットが与えられた場合、データをSnowflakeにロードする。
1.2 Snowflakeの仕組みを通じて、様々な形式のデータを取り込むことができる。
1.3 データ取り込みのトラブルシューティング
1.4 継続的データパイプラインの設計、構築、およびトラブルシューティング
1.5 データパイプラインのタイプを分析し、区別することができる。
1.6 Snowflakeに接続するためのコネクタのインストール、設定、および使用
1.7 データ共有ソリューションの設計と構築
1.8 外部テーブルを使用するタイミングを概説し、その仕組みを定義することができる。

Domain 2.0: Performance Optimization

2.1 Troubleshoot underperforming queries.
2.2 Given a scenario, configure a solution for the best performance.
2.3 Outline and use caching features.
2.4 Monitor continuous data pipelines.

ドメイン2.0 パフォーマンスの最適化

2.1 パフォーマンスの低いクエリをトラブルシューティングできる。
2.2 あるシナリオを想定し、最高のパフォーマンスを実現するソリューションを設定することができる。
2.3 キャッシュ機能の概要と使用
2.4 継続的なデータパイプラインを監視することができる。

Domain 3.0: Storage and Data Protection

3.1 Implement data recovery features in Snowflake.
3.2 Outline the impact of streams on Time Travel.
3.3 Use system functions to analyze micro-partitions.
3.4 Use Time Travel and cloning to create new development environments.

ドメイン3.0 ストレージとデータ保護

3.1 Snowflakeにデータ復旧機能を実装する。
3.2 タイムトラベルにおけるストリームの影響を概説する。
3.3 システム関数を使用してマイクロパーティションを分析する。
3.4 タイムトラベルとクローニングを使用して、新しい開発環境を作成する。

Domain 4.0: Security

4.1 Outline Snowflake security principles.
4.2 Outline the system defined roles and when they should be applied.
4.3 Manage data governance.

ドメイン4.0 セキュリティ

4.1 Snowflakeセキュリティ原則の概要を説明できる。
4.2 システムで定義された役割と、それらを適用するタイミングについて概説できる。
4.3 データガバナンスを管理することができる。

Domain 5.0: Data Transformation

5.1 Define User-Defined Functions (UDFs) and outline how to use them.
5.2 Define and create external functions.
5.3 Design, build, and leverage stored procedures.
5.4 Handle and transform semi-structured data.
5.5 Use Snowpark for data transformation.

ドメイン5.0 データ変換

5.1 ユーザー定義関数（UDF）を定義し、その使用方法を概説する。
5.2 外部関数を定義し、作成することができる。
5.3 ストアドプロシージャを設計、構築、活用することができる。
5.4 半構造化データの処理と変換
5.5 データ変換に Snowpark を使用する。

詳細

参考 : https://training.snowflake.com/lmt/clmsCatalogDetails.prMain?site=sf&in_offeringId=98873814&in_language_identifier=en-us&in_region=us&in_from_module=CLMSSHARE.PRMAIN

 Exam Study Guide Enrollment & Access
   1. Login to Snowflake Community account using your corporate email address
   2. Navigate to: training.snowflake.com & sign in using your Snowflake Community account credentials
   3. Go to the 'Certification’ category and click into the desired certification tile
   4. Click Enroll - this will enroll you into the study guide
   5. Click Play - this will launch the study guide

 試験勉強ガイド 登録とアクセス
   1. 会社のEメールアドレスを使用してSnowflake Communityアカウントにログインする。
   2. training.snowflake.comに移動し、Snowflake Communityアカウントの認証情報を使用してサインインします。
   3. Certification'カテゴリに移動し、希望のCertificationタイルをクリックします。
   4. Enroll（登録する）」をクリックします。
   5. 「再生」をクリックします。

とのことで、こちらのコンテンツを見るには、Snowflake Communityに登録が必要。

ただ、コンテンツの内容が追いついていないのか、ところどころ古かったりリンク切れを起こしている印象。

1.0 Data Movement 25-30%

1.1 Given a data set, load data into Snowflake.
● Outline considerations for data loading
● Define data loading features and potential impact

1.2 Ingest data of various formats through the mechanics of Snowflake.
● Required data formats
● Outline stages

1.3 Troubleshoot data ingestion.
● Identify causes of ingestion errors
● Determine resolutions for ingestion errors

1.4 Design, build and troubleshoot continuous data pipelines.
● Stages
● Tasks
● Streams
● Snowpipe (for example, Auto ingest as compared to Rest API)

1.5 Analyze and differentiate types of data pipelines.
● Create User-Defined Functions (UDFs) and stored procedures including
Snowpark
● Design and use the Snowflake SQL API

1.6 Install, configure, and use connectors to connect to Snowflake.

1.7 Design and build data sharing solutions.
● Implement a data share
● Create a secure view
● Implement row level filtering

1.8 Outline when to use external tables and define how they work.
● Partitioning external tables
● Materialized views
● Partitioned data unloading

1.1 データセットが与えられたら、データを Snowflake にロードする。
● データロードに関する考慮事項の概要
● データロード機能と潜在的な影響の定義

1.2 Snowflake の仕組みにより、さまざまな形式のデータを取り込むことができる。
● 必要なデータ形式
● ステージの概要

1.3 データ取り込みのトラブルシューティング
● 取り込みエラーの原因を特定する
● 取り込みエラーの解決策の決定

1.4 継続的データパイプラインの設計、構築、およびトラブルシューティング
● ステージ
● タスク
● ストリーム
● Snowpipe (例えば、Rest API と比較した自動取り込み)

1.5 データパイプラインの種類を分析し、区別することができる。
● ユーザー定義関数（UDF）とストアドプロシージャを作成します。
● Snowpark
● Snowflake SQL API の設計と使用

1.6 Snowflake に接続するためのコネクタをインストール、設定、使用することができる。

1.7 データ共有ソリューションの設計と構築
● データ共有の実装
● 安全なビューの作成
● 行レベルのフィルタリングの実装

1.8 外部テーブルを使用するタイミングを概説し、その仕組みを定義することができる。
● 外部テーブルのパーティショニング
● マテリアライズド・ビュー
● パーティショニングされたデータのアンロード

Domain 1.0: Data Movement Study Resources

2.0 Performance Optimization 20-25%

2.1 Troubleshoot underperforming queries.
● Identify underperforming queries
● Outline telemetry around the operation
● Increase efficiency
● Identify the root cause

2.2 Given a scenario, configure a solution for the best performance.
● Scale out as compared to scale up
● Virtual warehouse properties (for example, size, multi-cluster)
● Query complexity
● Micro-partitions and the impact of clustering
● Materialized views
● Search optimization service
● Query acceleration service

2.3 Outline and use caching features.

2.4 Monitor continuous data pipelines. 
● Snowpipe
● Tasks
● Streams

2.1 パフォーマンスの低いクエリをトラブルシューティングする。
● 不調なクエリを特定する
● 操作に関するテレメトリーの概要
● 効率を上げる
● 根本原因の特定

2.2 あるシナリオを想定した場合に、最高のパフォーマンスを実現するソリューションを設定できる。
● スケールアップと比較したスケールアウト
● 仮想ウェアハウスの特性（サイズ、マルチクラスタなど）
● クエリの複雑さ
● マイクロパーティションとクラスタリングの影響
● マテリアライズド・ビュー
● 検索最適化サービス
● クエリ高速化サービス

2.3 キャッシュ機能の概要と使用法

2.4 継続的データパイプラインの監視 
● Snowpipe
● タスク
● ストリーム

Domain 2.0: Performance Optimization Study Resources

Lab Guides

Resource Optimization: Performance
- https://quickstarts.snowflake.com/guide/resource_optimization_performance_optimization/index.html?index=..%2F..index#0
Resource Optimization: Usage Monitoring
- https://quickstarts.snowflake.com/guide/resource_optimization_usage_monitoring/index.html?index=..%2F..index#0
Building a Data Application
- https://quickstarts.snowflake.com/guide/data_app/index.html?index=..%2F..index#0

Additional Assets

Performance Impact from Local and Remote Disk Spilling (blog)
- https://community.snowflake.com/s/article/Performance-impact-from-local-and-remote-disk-spilling
Snowflake: Visualizing Warehouse Performance (blog)
- https://community.snowflake.com/s/article/Snowflake-Visualizing-Warehouse-Performance
Caching in Snowflake Data Warehouse (blog)
- (リンク切れ) https://community.snowflake.com/s/article/Caching-in-Snowflake-Data-Warehouse
- (関連しそうな記事) https://community.snowflake.com/s/article/Caching-in-the-Snowflake-Cloud-Data-Platform
- (関連しそうな記事) https://articles.analytics.today/snowflake-cache-how-it-works-and-why-it-matters
- (関連しそうな記事) https://thinketl.com/caching-in-snowflake/

Snowflake Documentation Links

Account Usage
- https://docs.snowflake.com/en/sql-reference/account-usage
Analyzing Queries Using Query Profile
- https://docs.snowflake.com/user-guide/ui-snowsight-activity#exploding-joins
COPY_HISTORY
- https://docs.snowflake.com/en/sql-reference/functions/copy_history
COPY_HISTORY View
- https://docs.snowflake.com/en/sql-reference/account-usage/copy_history
Databases, Tables & Views
- https://docs.snowflake.com/en/user-guide/databases
LOAD_HISTORY View
- https://docs.snowflake.com/en/sql-reference/account-usage/load_history
PIPE_USAGE_HISTORY View
- https://docs.snowflake.com/en/sql-reference/account-usage/pipe_usage_history
Queries
- https://docs.snowflake.com/guides-overview-queries
QUERY_HISTORY, QUERY_HISTORY_BY_*
- https://docs.snowflake.com/en/sql-reference/functions/query_history
SHOW STREAMS
- https://docs.snowflake.com/en/sql-reference/sql/show-streams
System Functions
- https://docs.snowflake.com/en/sql-reference/functions-system
TASK_HISTORY
- https://docs.snowflake.com/en/sql-reference/functions/task_history
Virtual Warehouses
- https://docs.snowflake.com/en/user-guide/warehouses

3.0 Storage and Data Protection 10-15%

3.1 Implement data recovery features in Snowflake. 
● Time Travel
● Fail-safe

3.2 Outline the impact of streams on Time Travel.

3.3 Use system functions to analyze micro-partitions.
● Clustering depth
● Cluster keys

3.4 Use Time Travel and cloning to create new development environments.
● Clone objects
● Validate changes before promoting
● Rollback changes

3.1 Snowflake にデータ復旧機能を実装する。
● タイムトラベル
● フェイルセーフ

3.2 タイムトラベルにおけるストリームの影響を概説できる。

3.3 システム関数を使用して、マイクロパーティションを分析する。
● クラスタリングの深さ
● クラスタキー

3.4 タイムトラベルとクローンを使用して、新しい開発環境を作成することができる。
● オブジェクトのクローン
● 昇格前の変更の検証
● 変更のロールバック

Domain 3.0: Storage & Data Protection Study Resources

Lab Guides

Getting Started with Time Travel
- https://quickstarts.snowflake.com/guide/getting_started_with_time_travel/index.html?index=..%2F..index#0

Snowflake Documentation Links

Snowflake Time Travel & Fail-safe
- https://docs.snowflake.com/en/user-guide/data-availability
Databases, Tables & Views
- https://docs.snowflake.com/en/user-guide/databases
Parameter Hierarchy and Types
- https://docs.snowflake.com/en/sql-reference/parameters#parameter-hierarchy-and-types
Database Replication and Failover/Failback
- https://docs.snowflake.com/user-guide/account-replication-intro
Continuous Data Pipelines
- https://docs.snowflake.com/user-guide/data-pipelines-intro
SYSTEM$CLUSTERING_INFORMATION
- https://docs.snowflake.com/en/sql-reference/functions/system_clustering_information
SYSTEM$CLUSTERING_DEPTH
- https://docs.snowflake.com/en/sql-reference/functions/system_clustering_depth

4.0 Security 10-15%

4.1 Outline Snowflake security principles.
● Authentication methods (Single Sign-On (SSO), key pair authentication,username/password, Multi-Factor Authentication (MFA))
● Role Based Access Control (RBAC)
● Column level security and how data masking works with RBAC to secure sensitive data

4.2 Outline the system defined roles and when they should be applied.
● The purpose of each of the system defined roles including best practices usage in each case
● The primary differences between SECURITYADMIN and USERADMIN roles
● The difference between the purpose and usage of the USERADMIN/SECURITYADMIN roles and SYSADMIN

4.3 Manage data governance.
● Explain the options available to support column level security including Dynamic Data Masking and external tokenization
● Explain the options available to support row level security using Snowflake row access policies
● Use DDL required to manage Dynamic Data Masking and row access policies
● Use methods and best practices for creating and applying masking policies on data
● Use methods and best practices for object tagging

4.1 Snowflake のセキュリティ原則を概説する。
● 認証方法（シングルサインオン（SSO）、鍵ペア認証、ユーザ名/パスワード、多要素認証（MFA）
● ロールベースのアクセス制御（RBAC）
● カラムレベルのセキュリティと、機密データを保護するためにデータマスキングが RBAC とどのように連動するか

4.2 システムで定義された役割と、それらを適用すべきタイミングについて概説することができる。
● システムで定義された各役割の目的（各ケースにおけるベストプラクティスの使用法を含む
● SECURITYADMIN ロールと USERADMIN ロールの主な違い
● USERADMIN/SECURITYADMIN ロールと SYSADMIN ロールの目的と使用方法の違い

4.3 データガバナンスを管理する
● 動的データマスキングや外部トークン化など、列レベルのセキュリティをサポートするためのオプションについて説明する。
● Snowflake の行アクセスポリシーを使用して行レベルのセキュリティをサポートするオプションを説明する
● 動的データマスキングと行アクセスポリシーの管理に必要な DDL を使用する
● データに対するマスキングポリシーの作成と適用の方法とベストプラクティスを使用する
● オブジェクトのタグ付けの方法とベストプラクティスを使用する

Domain 4.0: Security Study Resources

Additional Assets

Snowflake RBAC Security Prefers Role Inheritance to Role Composition (blog)
- https://community.snowflake.com/s/article/snowflake-rbac-security-prefers-role-inheritance-to-role-composition

Snowflake Documentation Links

CREATE MATERIALIZED VIEW
- https://docs.snowflake.com/en/sql-reference/sql/create-materialized-view
GRANT ...TO ROLE
- https://docs.snowflake.com/en/sql-reference/sql/grant-privilege
Managing Governance in Snowflake
- https://docs.snowflake.com/guides-overview-govern
Managing Security in Snowflake
- https://docs.snowflake.com/guides-overview-secure
Managing Your User Preferences
- https://docs.snowflake.com/en/user-guide/ui-preferences
Stored Procedures
- https://docs.snowflake.com/developer-guide/stored-procedure/stored-procedures-overview

5.0 Data Transformation 25-30%

5.1 Define User-Defined Functions (UDFs) and outline how to use them.
● Snowpark UDFs (for example, Java, Python, Scala)
● Secure UDFs
● SQL UDFs
● JavaScript UDFs
● User-Defined Table Functions (UDTFs)

5.2 Define and create external functions.
● Secure external functions
● Integration requirements

5.3 Design, build, and leverage stored procedures.
● Snowpark stored procedures (for example, Java, Python, Scala)
● SQL Scripting stored procedures
● JavaScript stored procedures
● Transaction management

5.4 Handle and transform semi-structured data.
● Traverse and transform semi-structured data to structured data
● Transform structured data to semi-structured data
● Understand how to work with unstructured data

5.5 Use Snowpark for data transformation.
● Understand Snowpark architecture
● Query and filter data using the Snowpark library
● Perform data transformations using Snowpark (for example, aggregations)
● Manipulate Snowpark DataFrames

5.1 ユーザー定義関数（UDF）を定義し、その使用方法を概説する。
● Snowpark UDF（Java、Python、Scala など）
● セキュア UDF
● SQL UDF
● JavaScript UDF
● ユーザ定義テーブル関数 (UDTF)

5.2 外部関数の定義と作成
● 安全な外部関数
● 統合要件

5.3 ストアドプロシージャの設計、構築、および活用
● Snowpark ストアドプロシージャ（Java、Python、Scala など）
● SQL スクリプトストアドプロシージャ
● JavaScript ストアドプロシージャ
● トランザクション管理

5.4 半構造化データの処理と変換
● 半構造化データを構造化データに変換する
● 構造化データを半構造化データに変換する
● 非構造化データの扱い方を理解する

5.5 データ変換に Snowpark を使用する
● Snowpark のアーキテクチャを理解する
● Snowpark ライブラリを使用したデータのクエリとフィルタリング
● Snowpark を使用したデータ変換の実行（集約など）
● Snowpark DataFrames の操作

Domain 5.0: Data Transformation Study Resources

Additional Assets

Snowflake For Data Engineering – Easily Ingest, Transform and Deliver Data for Up-To-The Moment Insight (white paper)
- (リンク切れ) https://resources.snowflake.com/data-engineering-2/snowflake-for-data-engineering-easily-ingest-transform-and-deliver-data-for-up-to-the-moment-insight
- (多分こちらで同じものが見れる) https://www.snowflake.com/en/resources/white-paper/snowflake-for-data-engineering-easily-ingest-transform-and-deliver-data-for-up-to-the-moment-insight/
Bringing Extensibility to Data Pipelines: What’s New with Snowflake External Functions (blog)
- (リンク切れ) https://www.snowflake.com/en/blog/bringing-extensibility-to-data-pipelines-whats-new-with-snowflake-external-functions/
Generating a JSON Dataset Using Relational Data in Snowflake (blog)
- https://community.snowflake.com/s/article/Generating-a-JSON-Dataset-using-Relational-Data-in-Snowflake
Best Practices for Managing Unstructured Data (white paper)
- (PDF/要登録) https://www.snowflake.com/resource/best-practices-for-managing-unstructured-data/

Snowflake Documentation Links

CREATE API INTEGRATION
- https://docs.snowflake.com/en/sql-reference/sql/create-api-integration
CREATE EXTERNAL FUNCTION
- https://docs.snowflake.com/en/sql-reference/sql/create-external-function
Databases, Tables & Views
- https://docs.snowflake.com/en/user-guide/databases
External Functions
- https://docs.snowflake.com/en/sql-reference/external-functions
Queries
- https://docs.snowflake.com/guides-overview-queries
Semi-Structured Data
- https://docs.snowflake.com/user-guide/semistructured-intro
Snowpark
- https://docs.snowflake.com/en/developer-guide/snowpark/index
Stored Procedures
- https://docs.snowflake.com/developer-guide/stored-procedure/stored-procedures-overview
Transactions
- https://docs.snowflake.com/en/sql-reference/transactions
TRY_PARSE_JSON
- https://docs.snowflake.com/en/sql-reference/functions/try_parse_json
UDFs (User-Defined Functions)
- https://docs.snowflake.com/developer-guide/udf/udf-overview

学習コンテンツ

先人のメモ

blog

その他

未採点の内容について

未採点の内容：試験には、将来使用する統計情報を収集するために採点されない項目が含まれることがあります。これらの項目はフォームには記載されず、スコアに影響しません。

とのこと。
なんとなく、比較的最近GAされたような最新機能はこちらになるかもしれない。

例えば、以下のような項目は学習ガイドには載っていなかったので、万一出題された場合はそのような扱いになる可能性が高いのではという予想。

ダイナミックテーブル関係
- https://docs.snowflake.com/en/user-guide/dynamic-tables-about
ハイブリッドテーブル関係
- https://docs.snowflake.com/en/user-guide/tables-hybrid
DQM/DMF関係
- https://docs.snowflake.com/en/user-guide/data-quality-intro
- https://docs.snowflake.com/en/user-guide/tutorials/data-quality-tutorial-start
投影ポリシー関係
- https://docs.snowflake.com/ja/user-guide/projection-policies
- https://docs.snowflake.com/ja/sql-reference/sql/create-projection-policy
集約ポリシー関係
- https://docs.snowflake.com/en/user-guide/aggregation-policies
- https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
差分プライバシー関係
- https://docs.snowflake.com/en/user-guide/diff-privacy/differential-privacy-overview
AI/ML関係

将来は試験の出題範囲に含めるための実験的なテストとして、現段階では採点に使用されないかもしれないし、実は採点対象になっているかもしれないしで、そこは明確には分からない。

例題

サンプル問題を見てみる。

例題1

問題

A row in a data file ends with the backslash () character
What can be done to prevent this row and the next row from being loaded as a single
row of data by the COPY command?

a. Set the RECORD_DELIMITER option to ‘\’.
b. Set the RECORD_DELIMITER option to NONE.
c. Set the ESCAPE_UNENCLOSED_FIELD option to ‘\’.
d. Set the ESCAPE_UNENCLOSED_FIELD option to NONE.

解説

データファイルの行がバックスラッシュ (\) で終わっている場合、この行と次の行が1つのデータ行としてロードされるのを防ぐにはどうすればよいか？

a. RECORD_DELIMITER オプションを \ に設定する。
b. RECORD_DELIMITER オプションを NONE に設定する。
c. ESCAPE_UNENCLOSED_FIELD オプションを \ に設定する。
d. ESCAPE_UNENCLOSED_FIELD オプションを NONE に設定する。

正解: d. ESCAPE_UNENCLOSED_FIELD オプションを NONE に設定する。

バックスラッシュがエスケープ文字として解釈されるのを防ぐため、ESCAPE_UNENCLOSED_FIELD を NONE に設定する。
これにより、\ で終わる行が次の行と結合されず、独立した行として扱われる。

参考

https://docs.snowflake.com/en/sql-reference/sql/create-file-format#format-type-options-formattypeoptions

ESCAPE_UNENCLOSED_FIELD = 'character' | NONE

Loading data
Specifies the escape character for unenclosed fields only.

Note

The default value is \. If a row in a data file ends in the backslash () character, this character escapes the newline or carriage return character specified for the RECORD_DELIMITER file format option. As a result, the load operation treats this row and the next row as a single row of data. To avoid this issue, set the value to NONE.
デフォルト値は \\
データファイルの行がバックスラッシュ（\）文字で終わる場合、この文字は RECORD_DELIMITER ファイル形式オプションに指定された改行文字または復帰文字をエスケープする
その結果、ロード操作はこの行と次の行を単一のデータ行として扱われる
この問題を回避するには、この値を NONE に設定する

例題2

問題

A Data Engineer has inherited a database and is monitoring a table with the below query every 30 days:

SELECT SYSTEM$CLUSTERING_INFORMATION( ‘orders’, ‘(o_orderdate)’);

The Engineer gets the first two results (e.g., Day 0 and Day 30).

-- DAY 0 -------

{
    "cluster_by_keys" : "LINEAR(o_orderdate)", 
    "total_partition_count" : 3218, 
    "total_constant_partition_count" : 0,
    "average_overlaps" : 20.4133, 
    "average_depth" : 11.4326,
    "partition_depth_histogram" : {
        "00000" : 0, 
        "00001" : 0, 
        "00002" : 0, 
        "00003" : 0, 
        "00004" : 0, 
        "00005" : 0, 
        "00006" : 0, 
        "00007" : 0, 
        "00008" : 0, 
        "00009" : 0, 
        "00010" : 993, 
        "00011" : 841, 
        "00012" : 748, 
        "00013" : 413, 
        "00014" : 121, 
        "00015" : 74, 
        "00016" : 16, 
        "00032" : 12
    }
}

-- DAY 30 -------

{
    "cluster_by_keys" : "LINEAR(o_orderdate)", 
    "total_partition_count" : 3240,
    "total_constant_partition_count" : 0, 
    "average_overlaps" : 64.1185, 
    "average_depth" : 33.4704, 
    "partition_depth_histogram" : {
        "00000" : 0, 
        "00001" : 0, 
        "00002" : 0, 
        "00003" : 0, 
        "00004" : 0, 
        "00005" : 0, 
        "00006" : 0, 
        "00007" : 0, 
        "00008" : 0, 
        "00009" : 0, 
        "00010" : 0, 
        "00011" : 0, 
        "00012" : 0, 
        "00013" : 0, 
        "00014" : 0, 
        "00015" : 0, 
        "00016" : 0, 
        "00032" : 993,
        "00064" : 2247
    } 
}

How should the Engineer interpret these results?

a. The table is well organized for queries that range over column o_orderdate. Over time, this organization is degrading.
b. The table was initially well organized for queries that range over column o_orderdate. Over time this organization has improved further.
c. The table was initially not organized for queries that range over column o_orderdate. Over time, this organization has changed.
d. The table was initially poorly organized for queries that range over column o_orderdate. Over time, this organization has improved.

解説

テーブルのクラスタリング状態を以下のクエリでモニタリングした結果（Day 0とDay 30の変化）をどう解釈するべきか？

SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(o_orderdate)');

a. テーブルは整理されており、時間とともに劣化している。
b. テーブルは整理されており、時間とともにさらに改善された。
c. テーブルは最初整理されておらず、時間とともに変化した。
d. テーブルは最初整理が不十分であり、時間とともに改善された。

正解: a. テーブルは整理されており、時間とともに劣化している

average_overlaps や average_depth の増加はクラスタリングの劣化を示している。データの挿入・更新により、範囲クエリの効率が低下していると判断できる。

参考

https://docs.snowflake.com/ja/sql-reference/functions/system_clustering_information#examples

average_overlaps
テーブル内にある各マイクロパーティションの重複するマイクロパーティションの平均数です。数値が大きい場合は、テーブルが適切にクラスター化されていないことを示します。

average_depth
テーブル内にある各マイクロパーティションの平均重複深度です。数値が大きい場合は、テーブルが適切にクラスター化されていないことを示します。
この値は、 SYSTEM$CLUSTERING_DEPTH によっても返されます。

例題3

問題

A Data Engineer is preparing to load staged data from an external stage using a task object.
Which of the following practices will provide the MOST efficient load performance?

a. Store the files on the external stage to ensure caching is maintained
b. PUT all files in a single directory
c. Limit file names to under 30 characters
d. Organize files into logical paths that reflect a scheduling pattern

解説

外部ステージからタスクオブジェクトを使ってデータをロードする際、最も効率的な方法は？

a. ファイルを外部ステージに保存し、キャッシュを維持。
b. すべてのファイルを1つのディレクトリに配置。
c. ファイル名を30文字以内に制限。
d. ファイルをスケジューリングパターンに合わせた論理パスに整理。

正解: d. ファイルをスケジューリングパターンに合わせた論理パスに整理。

ファイルをスケジュールに基づく論理パスに整理すると、ロードプロセスが効率化される。他の選択肢（例: ファイル名制限）は効率に直接的な影響を与えない。

参考

https://docs.snowflake.com/en/user-guide/data-load-considerations-stage#organizing-data-by-path

When staging regular data sets, we recommend partitioning the data into logical paths that include identifying details such as geographical location or other source identifiers, along with the date when the data was written.
...
When loading your staged data, narrow the path to the most granular level that includes your data for improved data load performance.

通常のデータセットをステージングする場合は、データを論理パスに分割し、地理的な場所やその他のソース識別子などの識別情報と「データが書き込まれた日付」を含めることをお勧めします。

ステージングされたデータをロードする際には、データロードのパフォーマンスを向上させるために、データを含むパスを最も細かいレベルまで絞り込みます。

例題4

問題

A Data Engineer is working on a project that requires data to be moved directly from an internal stage to an external stage.
Which of the following is the QUICKEST way to accomplish this task?

a. COPY INTO @myExtStage from (SELECT $1, $2, ... @myInternalStage);
b. Copy the data from the internal stage to a table and then unload the data to an external stage
c. COPY INTO @myExtStage from @myInternalStage;
d. Write a custom script to move the data

解説

内部ステージから外部ステージにデータを直接移動する最速の方法は？

a. COPY INTO @myExtStage from (SELECT $1, $2, ... FROM @myInternalStage);
b. 内部ステージからデータをテーブルにコピーして外部ステージにアンロード。
c. COPY INTO @myExtStage FROM @myInternalStage;
d. カスタムスクリプトでデータを移動。

正解: a. COPY INTO @myExtStage from (SELECT $1, $2, ... FROM @myInternalStage);

Snowflakeとしては、明示的に列を指定する SELECT $1, $2, ... 形式が推奨される模様。

公式ドキュメントで関連箇所を見てみた感じでは、aとcの明確な違いはよく分からなかった。SQLを実行するまでの手順としてはcの方が短いため、cは単手番という意味では最速な気がした。

この件に関して、ChatGPTを使って推測してみた結果は以下の通りだった。

列を明示的に指定しない場合、Snowflakeは内部ステージの列構造を解析する必要があるため、処理時間が長くなる可能性がある
列指定を行うことで、列解析が省略され、処理が高速化される

という仮説。

例題5

問題

The S1 schema contains two permanent tables that were created as shown below:

CREATE TABLE table_a (c1 INT)
DATA_RETENTION_TIME_IN_DAYS = 10;

CREATE TABLE table_b (c1 INT);

What will be the impact of running the following command?

ALTER SCHEMA S1 SET DATA_RETENTION_TIME_IN_DAYS = 20;

a. The retention time on table_a does not change; table_b is set to 20 days.
b. An error will be generated; a data retention time on a schema cannot be set.
c. The retention time on both tables will be set to 20 days.
d. The retention time will not change on either table.

解説

以下のスキーマを持つテーブルに対して DATA_RETENTION_TIME_IN_DAYS
を設定した場合の影響は？

CREATE TABLE table_a (c1 INT)
DATA_RETENTION_TIME_IN_DAYS = 10;

CREATE TABLE table_b (c1 INT);

スキーマ変更コマンド:

ALTER SCHEMA S1 SET DATA_RETENTION_TIME_IN_DAYS = 20;

a. table_a の保持期間は変更されず、table_b が20日に設定される。
b. エラーが発生する。スキーマに保持期間を設定できない。
c. 両方のテーブルの保持期間が20日に設定される。
d. 両方のテーブルの保持期間に変更はない。

正解: a. table_a の保持期間は変更されず、table_b が20日に設定される。

テーブルごとに個別設定された値は、スキーマレベルの設定で上書きされない。スキーマレベルの設定は、デフォルトとして適用されるため、table_b にのみ反映される。

参考

https://docs.snowflake.com/ja/user-guide/data-time-travel#changing-the-data-retention-period-for-an-object

注意
アカウントまたは個々のオブジェクトの保持期間を変更すると、保持期間が明示的に設定されていない下位レベルのオブジェクトの値すべてが変更されます。
...
スキーマレベルで保持期間を変更すると、明示的な保持期間を持たないスキーマ内のすべてのテーブルが新しい保持期間を継承します。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up