More than 5 years have passed since last update.

Microsoft Azure Tech Advent Calendar 2018

「Azure Machine Learningコンピューティング」(Azure Machine Learning Compute) とは?

Posted at 2018-12-31

「Azure Machine Learningサービス」の新機能「Azure Machine Learningコンピューティング」について、紹介します。

Azure Machine Learning

機械学習 (ML) モデルの開発/デプロイを支援するAzureサービス「Azure Machine Learning」には、2つのサービスがあります。

1つは、「Azure Machine Learning Studio」です。Azure Machine Learning Studioでは、Webブラウザーで動作するGUIベースの開発環境で、機械学習モデルの作成、トレーニング、Webサービスとしてのデプロイなどを行うことができます。

Azure Machine Learning Studioは、2014年7月にパブリックプレビューになり、2015年2月にGA (一般提供)になりました。後述するもう1つのサービス「Azure Machine Learningサービス」が登場するまでは、「Azure Machine Learning Studio」は「Azure Machine Learning」と呼ばれていました。

Azure Machine Learning Studioには、登録なしで8時間試用できる「ゲストワークスペース」、登録することで無期限で利用できる「無料ワークスペース」、Azureサブスクリプション内で作成でいる有料の「標準ワークスペース」があります。興味のある方は、まずはゲストワークスペースを作成し、試用してみてください。

Azure Machine Learning Studio
https://azure.microsoft.com/services/machine-learning-studio/
https://studio.azureml.net/
https://docs.microsoft.com/azure/machine-learning/studio/

もう1つのサービスは、「Azure Machine Learningサービス」です。Azure Machine Learningサービスは、機械学習モデルの構築、トレーニング、デプロイを支援する、Pythonを使うデータサイエンティスト向けのサービスです。

Azure Machine Learningサービスは、2017年9月のIgnite 2017で発表され、パブリックプレビューが始まりました。2018年9月のIgnite 2018では、パブリックプレビューの機能が大幅に刷新されました。そして、今月 (2018年12月)、Azure Machine Learningサービスが、GAになりました。

Azure Machine Learningサービス
https://azure.microsoft.com/services/machine-learning-service/
https://docs.microsoft.com/azure/machine-learning/service/

Azure Machine Learningサービスの概要については、GAを発表した次のブログポストもご覧ください。

Azure Blog > Azure Machine Learning service の一般公開に関するお知らせ: その具体的な内容 (2018/12/04)
https://azure.microsoft.com/blog/azure-machine-learning-service-a-look-under-the-hood/

Azure Machine Learningコンピューティング

大きく発表はしていませんが、Azure Machine LearningサービスのGAとともに、機械学習モデルのトレーニングを実行するVMプールを提供する新機能「Azure Machine Learning コンピューティング」がGA機能として追加されました。前述のブログポストでは、次のように書かれています。

トレーニング

Azure Machine Learning service は、シームレスな分散コンピューティング機能を提供します。これにより、データサイエンティストは、ローカルラップトップやワークステーションからクラウドにトレーニングをスケールアウトできます。このコンピューティングはオンデマンドです。ユーザーは、コンピューティング時間に対して課金されるのみで、GPU や CPU クラスターを維持管理する必要はありません。

Azure Machine Learningコンピューティングには、前身となるサービス「Azure Batch AI」がありました。2017年5月のBuild 2017で「Azure Batch AI Training」として発表され、2017年10月に「Azure Batch AI」としてパブリックプレビューになりました。

Azure Batch AI
https://azure.microsoft.com/services/batch-ai/
https://docs.microsoft.com/azure/batch-ai/

Azure Machine Learningコンピューティングの発表、GAに伴い、Azure Batch AIはGAになることなく、2019年3月にサービスが終了となる予定です。Azure Batch AIをお使いの方は、早急にAzure Machine Learningコンピューティングへの移行を検討してください。

Azure Batch AI の現状
https://docs.microsoft.com/azure/batch-ai/overview-what-happened-batch-ai

Azure Machine Learningコンピューティングの利用方法

さて、ここからは、Azure Machine LearningサービスでのAzure Machine Learningコンピューティングの利用方法について、簡単に見てみましょう。

Azure Machine Learning サービスは、主に機械学習モデルのトレーニングとデプロイを支援します。Azure Machine Learningコンピューティングは、トレーニングを実行する「コンピューティングターゲット」の1つです。

Azure Machine Learning サービスのしくみ: アーキテクチャと概念
https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture

トレーニングのコンピューティングターゲットとしては、他にPython SDKを実行しているローカルマシン、Azure上のLinux VM、Azure Databricksなどの選択肢があります。

Azure Machine Learningコンピューティングは、現時点では唯一の「マネージドコンピューティング」(Azure Machine Learningサービスが作成、管理するコンピューティングターゲット) であり、様々な新機能が最初に実装されるコンピューティングターゲットです。他のコンピューティングターゲットを使いたい強い理由がない場合は、Azure Machine Learningコンピューティングを使うことをお勧めします。

Azure Machine Learning サービスのしくみ: アーキテクチャと概念 > コンピューティングターゲット
https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target
モデルトレーニング用のコンピューティングターゲットを設定する > サポートされているコンピューティングターゲット
https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#supported-compute-targets

Azure Machine Learningコンピューティングは、Python SDKを使って作成します。トレーニングの実行ごとにAzure Machine Learningコンピューティングを作成することもできますし、事前に作成したAzure Machine Learningコンピューティングをトレーニングの際に参照して利用することもできます。

次のPythonコードでは、Azure Machine Learningコンピューティング (AmlCompute) を作成しています。最低限必要なオプションは、VMサイズ (vm_size)、最大ノード数 (max_nodes) です。

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpucluster"

# Verify that cluster does not exist already
try:
  cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
  print('Found existing cluster, use it.')
except ComputeTargetException:
  compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                         max_nodes=4)
  cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

VMサイズとしては、CPUに加えてGPU (Nシリーズ) もサポートされています。Azure Machine Learningコンピューティングが動作しているAzureリージョンで、使いたいGPUがサポートされていること、クォータの上限に達していないことを確認しましょう。

Azure リソースのクォータの管理と要求
https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas

最小ノード数 (min_nodes) は既定で0なので、最小ノード数を指定しない場合、未使用時には0ノードまでスケールインし、料金が発生しないようになります。

VM優先度 (vm_priority)を、既定の「dedicated」から「lowpriority」(低優先度) に変更することで、低優先度VMを使うことができます。低優先度VMは、VMが使用できない、あるいは割り込まれる可能性がある代わりに、通常のVM (dedicated) に比べて大幅に安価な料金設定になっています。

Batch で優先順位の低い VM を使用する
https://docs.microsoft.com/azure/batch/batch-low-pri-vms

他に、スケールインするまでの時間や、仮想ネットワーク関連のオプションがあります。

モデルトレーニング用のコンピューティングターゲットを設定する > Azure Machine Learning コンピューティング
https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute
azureml.core.compute.AmlCompute
https://docs.microsoft.com/python/api/azureml-core/azureml.core.compute.amlcompute(class)

Azure Machine Learningコンピューティングを作成したら、トレーニングのためのPythonスクリプトを準備し、「Estimator」を作成し、トレーニングジョブを送信します。

次の例では、Azure Machine Learningコンピューティング上の単一ノードでトレーニングを行っています。

from azureml.train.estimator import Estimator

script_params = {
  '--data-folder': ds.as_mount(),
  '--regularization': 0.8
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                entry_script='train.py',
                conda_packages=['scikit-learn'])

Azure Machine Learning でモデルをトレーニングする方法
https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models
azureml.train.estimator.Estimator
https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator.estimator

PyTorch、TensorFlowでは、Azure Machine Learningコンピューティング上の複数ノードにわたる分散トレーニングもサポートされています。

次の例では、Horovodフレームワークを使ってPyTorchのMPIベースの分散トレーニングを実行しています。

from azureml.train.dnn import PyTorch

pt_est = PyTorch(source_directory='./my-pytorch-project',
                 script_params={},
                 compute_target=compute_target,
                 entry_script='train.py',
                 node_count=2,
                 process_count_per_node=1,
                 distributed_backend='mpi',
                 use_gpu=True)

PyTorch モデルをトレーニングする方法
https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-train-pytorch#distributed-training
Azure Machine Learning サービスによる TensorFlow モデルのトレーニング
https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-train-tensorflow

Azure Machine Learningコンピューティングでは、ハイパーパラメーターチューニング、自動機械学習もサポートされており、トレーニングにかかる時間や手間を削減することができます。

モデルに合わせてハイパーパラメーターを調整する
https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters
自動機械学習の実験を構成する
https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-auto-train

Azure Machine Learningサービスの始め方

ここからは、Azure Machine Learningサービスの始め方を簡単に紹介しましょう。

Azureサブスクリプションが必要です。Azureサブスクリプションをお持ちでない方は、無料アカウントの登録を行ってください。電話/SMSによる認証、クレジットカード登録が必要です。

Azure Machine Learningサービスを使うには、Pythonベースの開発環境が必要です。ローカルマシンに環境を構築することもできますし、VMイメージ「Data Science Virtual Machine」(DSVM)を使うこともできます。

Azure Machine Learning のための開発環境を構成する
https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-environment

お手軽な方法としては、Jupyterノートブックの実行環境を提供する無料サービス「Azure Notebooks」を使う方法があります。

次のクイックスタートに従うと、AzureポータルでAzure Machine Learningワークスペースを作成し、Azure Notebooksに、そのAzure Machine Learningワークスペース向けの構成済みのノートブックのサンプルをクローンすることができます。

クイックスタート:Azure portal を使用した Azure Machine Learning の基本操作
https://docs.microsoft.com/ja-jp/azure/machine-learning/service/quickstart-get-started

次のノートブックは、クローン元のノートブックです。クイックスタートの手順に従うと、config.jsonファイルに、作成済みのAzure Machine Learningワークスペースの接続情報 (AzureサブスクリプションID、リソースグループ名、Azure Machine Learningワークスペース名) が自動入力されます。

Azure Notebooks > azureml > Getting Started
https://notebooks.azure.com/azureml/projects/azureml-getting-started

同様のサンプルは、GitHubでも公開されています。

GitHub > Azure/MachineLearningNotebooks
https://github.com/Azure/MachineLearningNotebooks

Azure Notebooksでクローンされたサンプルを開き、適宜実行してみることで、Azure Machine Learningコンピューティングなど、さまざまなAzure Machine Learningサービスの機能を簡単に試すことができます。

Azure Machine Learningコンピューティングを使う最も基本的なサンプルは、こちらです。scikit-learnを使ったトレーニングを、Azure Machine Learningコンピューティング上の単一ノードで実行しています。

チュートリアル:Azure Machine Learning service でイメージ分類モデルをトレーニングする
https://docs.microsoft.com/ja-jp/azure/machine-learning/service/tutorial-train-models-with-aml
https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/tutorials/img-classification-part1-training.ipynb
https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/img-classification-part1-training.ipynb

Azure Machine Learningコンピューティングに関するサンプルは、こちらです。

「how-to-use-azureml」ディレクトリには、Azure Machine Learningコンピューティングを使った自動機械学習、深層学習 (ディープラーニング) のサンプルもあります。興味のある方は、確認してみてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up