はじめに
Databricks Certified Machine Learning Associateの5問の模擬試験 (Practice Exam) がPDF形式で無料で公開されています。PDFの末尾に正答のみ記載があり解説はありません。そこで本記事では自身の勉強も兼ねて、回答の解説を試みたいと思います。
Notes
- 2024年1月15日時点の模擬試験が本記事での対象です
- 本記事の内容はDatabricks社公式のものではない = 非公式である点に留意ください
- 誤りなど発見された場合は編集リクエストで教えて頂けると幸いです
Question 1
Objective: Create a new branch and commit changes to an external Git provider.
A data scientist is developing a machine learning model. They made changes to their code in a text editor on their local machine, committed them to the project’s Git repository, and pushed the changes to an online Git provider. Now, they want to load those changes into Databricks. The Databricks workspace contains an out-of-date version of the Git repository.
How can the data scientist complete this task?
A. Open the Repo Git dialog and enable automatic syncing.
B. Open the Repo Git dialog and click the “Sync” button.
C. Open the Repo Git dialog and click the “Merge” button.
D. Open the Repo Git dialog and enable automatic pulling.
E. Open the Repo Git dialog and click the “Pull” button.
正解
E
Databricksワークスペースに古いバージョンのGitリポジトリがあり、それを最新化する方法を問う問題。正解はGit Pullを行うE。
Question 2
Objective: Write data to a feature store table.
A data scientist has computed updated rows that contain new feature values for primary keys already stored in the Feature Store table features
. The updated feature values are stored in the DataFrame features_df
. They want to update the rows in features if the associated primary key is in features_df
. If a row’s primary key is not in features_df
, they want the row to remain unchanged in features.
Which code block using the Feature Store Client fs can be used to accomplish this task?
A.
fs.write_table(
name="features",
df=features_df,
mode="merge"
)
B.
fs.write_table(
name="features",
df=features_df,
mode="overwrite"
)
C.
fs.write_table(
name="features",
df=features_df,
)
D.
fs.create_table(
name="features",
df=features_df,
mode="append"
)
E.
fs.refresh_table(
name="features",
df=features_df,
mode="overwrite"
)
正解
A
いわゆるマージの操作:features_df
データフレームに存在するプライマリーキーを持つfeatures
テーブルのレコード更新し、それ以外のfeatures
テーブルのレコードはそのままにしたい。mode="merge"
を指定しているAが正解。
Question 3
Objective: Programmatically transition a model’s stage.
A senior machine learning engineer is developing a machine learning pipeline. They set up the pipeline to automatically transition a new version of a registered model to the Production stage in the Model Registry once it passes all tests using the MLflow Client API client.
Which operation was used to transition the model to the Production stage?
A. Client.update_model_stage
B. client.transition_model_version_stage
C. client.transition_model_version
D. client.update_model_version
正解
B
この問題は機械学習パイプラインにおいて、登録済みのモデルの新バージョンを本番環境ステージに自動的に移行させる操作について尋ねている。このプロセスは、すべてのテストに合格した後、MLflowクライアントAPIを使用して行われる。モデルを本番環境ステージに移行するために使用される操作はBが正解。
Question 4
Objective: Make a Python library newpackage available in a given scenario.
A machine learning team wants to use the Python library newpackage
on all of their projects. They share a cluster for all of their projects.
Which approach makes the Python library newpackage
available to all notebooks run on a cluster?
A. Edit the cluster to use the Databricks Runtime for Machine Learning
B. Set the runtime-version variable in their Spark session to "ml"
C. Running %pip install newpackage once on any notebook attached to the cluster
D. Adding /databricks/python/bin/pip install newpackage to the cluster’s bash init script
E. There is no way to make the newpackage library available on a cluster
正解
D
クラスタ上で実行されるすべてのノートブックでnewpackage
を使う方法を問う問題。initスクリプトを利用するDが正解。
Question 5
Objective: identify code blocks for computing the accuracy of a model.
A data scientist has developed a two-class decision tree classifier using Spark ML and computed the predictions in a Spark DataFrame preds_df
with the following schema:
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the accuracy of the model according to the data in preds_df
and assign it to the accuracy variable?
A.
accuracy = RegressionEvaluator(
predictionCol="prediction",
labelCol="actual",
metricName="accuracy"
)
B.
accuracy = MulticlassClassificationEvaluator(
predictionCol="prediction",
labelCol="actual",
metricName="accuracy"
)
accuracy = classification_evaluator.evaluate(preds_df)
C.
classification_evaluator = BinaryClassificationEvaluator(
predictionCol="prediction",
labelCol="actual",
metricName="accuracy"
)
D.
accuracy = Summarizer(
predictionCol="prediction",
labelCol="actual",
metricName="accuracy"
)
E.
classification_evaluator = BinaryClassificationEvaluator(
predictionCol="prediction",
labelCol="actual",
metricName="accuracy"
)
accuracy = classification_evaluator.evaluate(preds_df)
正解
E
BinaryClassificationEvaluator
を使用して予測と実際の値を比較し、その結果をaccuracy
変数に割り当てるEが正解。