Databricksにてモデル学習してみた

Posted at 2021-02-09

Databricks ～MLFLOWトラッキングモデル実行結果～

DatabricksのNotebookを活用して、モデル学習をしてみました。
モデル学習した結果の表示や管理等まとめていきます。

参考Notebookは公式ドキュメント記載のMLflowPythonQuickstartを使用します。

・自動的にログに記録する設定

MLflowはmlflow.<framework>.autolog()、多くのMLフレームワークで記述されたトレーニングコードを自動的にログに記録するためのAPIを提供します。トレーニングコードを実行する前にこのAPIを呼び出して、モデル固有のメトリック、パラメーター、およびモデルアーティファクトをログに記録できます。

コード例

# Also autoinstruments tf.keras
import mlflow.tensorflow
mlflow.tensorflow.autolog()

Notebook

import mlflow.sklearn
mlflow.sklearn.autolog()

ランダムフォレストモデルを作成し、使用して、パラメーター、メトリック、およびモデルをログに記録します

# Enable autolog()
# mlflow.sklearn.autolog() requires mlflow 1.11.0 or above.
mlflow.sklearn.autolog()

# With autolog() enabled, all model parameters, a model score, and the fitted model are automatically logged.  
with mlflow.start_run():
  
  # Set the model parameters. 
  n_estimators = 100
  max_depth = 6
  max_features = 3
  
  # Create and train model.
  rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features)
  rf.fit(X_train, y_train)
  
  # Use the model to make predictions on the test dataset.
  predictions = rf.predict(X_test)

with mlflow.start_run():のコマンドを叩くと、mlflowがstartして、モデル学習が行われる