More than 1 year has passed since last update.

Databricks Certified Machine Learning Associate 模擬試験の回答の非公式解説（2024年1月時点）

Databricks

Posted at 2024-01-16

はじめに

Databricks Certified Machine Learning Associateの5問の模擬試験 (Practice Exam) がPDF形式で無料で公開されています。PDFの末尾に正答のみ記載があり解説はありません。そこで本記事では自身の勉強も兼ねて、回答の解説を試みたいと思います。

Notes

2024年1月15日時点の模擬試験が本記事での対象です
本記事の内容はDatabricks社公式のものではない = 非公式である点に留意ください
誤りなど発見された場合は編集リクエストで教えて頂けると幸いです

Question 1

Objective: Create a new branch and commit changes to an external Git provider.

A data scientist is developing a machine learning model. They made changes to their code in a text editor on their local machine, committed them to the project’s Git repository, and pushed the changes to an online Git provider. Now, they want to load those changes into Databricks. The Databricks workspace contains an out-of-date version of the Git repository.

How can the data scientist complete this task?

A. Open the Repo Git dialog and enable automatic syncing.
B. Open the Repo Git dialog and click the “Sync” button.
C. Open the Repo Git dialog and click the “Merge” button.
D. Open the Repo Git dialog and enable automatic pulling.
E. Open the Repo Git dialog and click the “Pull” button.

正解
E

Databricksワークスペースに古いバージョンのGitリポジトリがあり、それを最新化する方法を問う問題。正解はGit Pullを行うE。

Question 2

Objective: Write data to a feature store table.

A data scientist has computed updated rows that contain new feature values for primary keys already stored in the Feature Store table features. The updated feature values are stored in the DataFrame features_df. They want to update the rows in features if the associated primary key is in features_df. If a row’s primary key is not in features_df, they want the row to remain unchanged in features.

Which code block using the Feature Store Client fs can be used to accomplish this task?

A.
fs.write_table(
    name="features",
    df=features_df,
    mode="merge"
)

B.
fs.write_table(
    name="features",
    df=features_df,
    mode="overwrite"
)

C.
fs.write_table(
    name="features",
    df=features_df,
)

D.
fs.create_table(
    name="features",
    df=features_df,
    mode="append"
)

E.
fs.refresh_table(
    name="features",
    df=features_df,
    mode="overwrite"
)

正解
A

いわゆるマージの操作：features_dfデータフレームに存在するプライマリーキーを持つfeaturesテーブルのレコード更新し、それ以外のfeaturesテーブルのレコードはそのままにしたい。mode="merge"を指定しているAが正解。

Question 3

Objective: Programmatically transition a model’s stage.

A senior machine learning engineer is developing a machine learning pipeline. They set up the pipeline to automatically transition a new version of a registered model to the Production stage in the Model Registry once it passes all tests using the MLflow Client API client.

Which operation was used to transition the model to the Production stage?

A. Client.update_model_stage
B. client.transition_model_version_stage
C. client.transition_model_version
D. client.update_model_version

正解
B

この問題は機械学習パイプラインにおいて、登録済みのモデルの新バージョンを本番環境ステージに自動的に移行させる操作について尋ねている。このプロセスは、すべてのテストに合格した後、MLflowクライアントAPIを使用して行われる。モデルを本番環境ステージに移行するために使用される操作はBが正解。

Question 4

Objective: Make a Python library newpackage available in a given scenario.

A machine learning team wants to use the Python library newpackage on all of their projects. They share a cluster for all of their projects.

Which approach makes the Python library newpackage available to all notebooks run on a cluster?

A. Edit the cluster to use the Databricks Runtime for Machine Learning
B. Set the runtime-version variable in their Spark session to "ml"
C. Running %pip install newpackage once on any notebook attached to the cluster
D. Adding /databricks/python/bin/pip install newpackage to the cluster’s bash init script
E. There is no way to make the newpackage library available on a cluster

正解
D

クラスタ上で実行されるすべてのノートブックでnewpackageを使う方法を問う問題。initスクリプトを利用するDが正解。

Question 5

Objective: identify code blocks for computing the accuracy of a model.

A data scientist has developed a two-class decision tree classifier using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE
actual DOUBLE

Which of the following code blocks can be used to compute the accuracy of the model according to the data in preds_df and assign it to the accuracy variable?

A.
accuracy = RegressionEvaluator(
    predictionCol="prediction",
    labelCol="actual",
    metricName="accuracy"
)

B.
accuracy = MulticlassClassificationEvaluator(
    predictionCol="prediction",
    labelCol="actual",
    metricName="accuracy"
)
accuracy = classification_evaluator.evaluate(preds_df)

C.
classification_evaluator = BinaryClassificationEvaluator(
    predictionCol="prediction",
    labelCol="actual",
    metricName="accuracy"
)

D.
accuracy = Summarizer(
    predictionCol="prediction",
    labelCol="actual",
    metricName="accuracy"
)

E.
classification_evaluator = BinaryClassificationEvaluator(
    predictionCol="prediction",
    labelCol="actual",
    metricName="accuracy"
)
accuracy = classification_evaluator.evaluate(preds_df)

正解
E

BinaryClassificationEvaluatorを使用して予測と実際の値を比較し、その結果をaccuracy変数に割り当てるEが正解。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up