Amazon SageMaker Autopilot のサンプルでモデルを作成してみた

Last updated at 2019-12-25Posted at 2019-12-16

SageMakerAutopilot　とは

AWSが提供する自動で前処理、アルゴリズム選定、ハイパーパラメータの最適化をしてくれるものです。
SageMaker上で動くAutoMLというわけですね。
今回は、Autopilotのサンプルがあるので、これを実際に動かしてみたいと思います。
→Autopilotサンプル

サンプルを動かしてみる

まず、必要なライブラリやSessionの作成をします。

jupyter

import sagemaker
import boto3
from sagemaker import get_execution_role

region = boto3.Session().region_name

session = sagemaker.Session()
bucket = session.default_bucket()
prefix = 'sagemaker/autopilot-dm'

role = get_execution_role()

sm = boto3.Session().client(service_name='sagemaker',region_name=region)

続いて、データセットのダウンロードを行います。
今回利用しているのは、Bank Marketing Data Setというデータです。
銀行のダイレクトマーケティングのデータで、定期預金を実行するかのデータみたいです。

jupyter

!wget -N https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip
!unzip -o bank-additional.zip

local_data_path = './bank-additional/bank-additional-full.csv'

次に、ダウンロードしたデータを、テストデータとトレインデータに分割し、目的変数である「y」の列を削除します。

jupyter

import pandas as pd

data = pd.read_csv(local_data_path, sep=';')
train_data = data.sample(frac=0.8,random_state=200)

test_data = data.drop(train_data.index)

test_data_no_target = test_data.drop(columns=['y'])

その後、分割したそれぞれのデータをS3にアップロードします。

jupyter

train_file = 'train_data.csv';
train_data.to_csv(train_file, index=False, header=True)
train_data_s3_path = session.upload_data(path=train_file, key_prefix=prefix + "/train")
print('Train data uploaded to: ' + train_data_s3_path)

test_file = 'test_data.csv';
test_data_no_target.to_csv(test_file, index=False, header=False)
test_data_s3_path = session.upload_data(path=test_file, key_prefix=prefix + "/test")
print('Test data uploaded to: ' + test_data_s3_path)

次に、Autopilotの設定をしていきます。
このサンプルでは以下のような設定になっていますが、ほかにもいろいろ設定できるようです。
設定についてはこちらのドキュメントに書かれているので、ぜひ確認してみてください。

jupyter


input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/train'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'y'
    }
  ]

output_data_config = {
    'S3OutputPath': 's3://{}/{}/output'.format(bucket,prefix)
  }

設定ができたので、実際に動かしてみましょう。

jupyter

from time import gmtime, strftime, sleep
timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())

auto_ml_job_name = 'automl-banking-' + timestamp_suffix
print('AutoMLJobName: ' + auto_ml_job_name)

sm.create_auto_ml_job(AutoMLJobName=auto_ml_job_name,
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      RoleArn=role)

以下を記述することで、30秒ごとに実行している内容が出力されます。

jupyter

print ('JobStatus - Secondary Status')
print('------------------------------')


describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])
job_run_status = describe_response['AutoMLJobStatus']
    
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
    job_run_status = describe_response['AutoMLJobStatus']
    
    print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])
    sleep(30)

出力が「Completed」になればモデルの作成が完了です。
大体2時間強くらいの時間がかかったと思います。

まとめ

今回はSageMakerAutopilotを用いてモデルの自動作成を行ってみました。
データの用意を行うだけでモデルが作成できるので、AutoMLってすごいんだなと改めて実感しました。
これでモデルを作成のとっつきにくさが軽減されて、幅広くMLが使われるようになればいいなと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Amazon SageMaker Autopilot のサンプルでモデルを作成してみた

SageMakerAutopilot とは

サンプルを動かしてみる

まとめ

SageMakerAutopilot　とは