More than 5 years have passed since last update.

データとfitだけで始めるAutoML - AutoGluon使ってみた -

Last updated at 2019-12-16Posted at 2019-12-14

本日MXNet公式でアナウンスがあったばかりのAutoGluonを早速使ってこの記事を書きました。これは普通のHyperparameter Optimizationには止まらない，Feature EngineeringやModel Selectionまで含む，いわゆる「全てお任せ」のAutoMLライブラリです。

中のコードはMXNet主体ですが，AutoGluonで全て隠蔽されているので，実際MXNetを使ったことがなくても問題なく利用できます。
（For Pytorch Userというのがある通りPythonコードを書けばPytorchなどでも利用できる様です。念のため。）

今回は速報的に公式のチュートリアルを追ってみたいと思います。
公式Tweetは以下。

AutoGluon, a new open source toolkit from @awscloud based on #MXNet, enables users to achieve state-of-the-art deep learning solutions with 3 lines Python of code. See post for more details: #AutoML #DeepLearning https://t.co/3sVPygX6Qu
— Apache MXNet (@ApacheMXNet) December 13, 2019

~追記~
@tomo_makes -sanがこちらを実行できるcolabを作成してくれましたので，掲載させて頂きます。ありがとうございます！
https://colab.research.google.com/gist/tomo-makes/d388f40632d553778e59ab5552b5cdcb/-kirikei-autogluon.ipynb

Installation

公式にある通り，MXNetとAutoGluonをpipで入れます。今回はGPUで利用するのでcudaバージョンで。私の環境はcuda10なのでcu100でインストールしました。

# Here we assume CUDA 10.0 is installed.  You should change the number
# according to your own CUDA version (e.g. mxnet-cu101 for CUDA 10.1).
pip install --upgrade mxnet-cu100
pip install autogluon

For Table Data

公式チュートリアルでは，よく使われる国税調査のデータadultデータセットを用いて分類問題と回帰問題を解いています。テーブルデータ用のクラス，TabularPredictionではPandas-LikeなDatasetクラスをを利用してデータを読み込む様ですね。チュートリアルでは簡単のため500レコードとしていますが，2000データに増やしてみました。

import autogluon as ag
from autogluon import TabularPrediction as task

train_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
train_data = train_data.head(2000) # subsample 2000 data points for faster demo

データの中身はこんな感じ

train_data.head()

結構カテゴリカルデータが入っており，普通だと前処理がめんどくさそうな印象です。

Classification

分類問題のタスクでは，classカラムの分類，すなわち収入が50k以上か未満かの二値分類を行います。以下に示す実際のコードは非常に簡単なものとなっています。

まずlabelとなるclassを覗いてみます。

label_column = 'class'
print("Summary of class variable: \n", train_data[label_column].describe())

Summary of class variable: 
 count       2000
unique         2
top        <=50K
freq        1551
Name: class, dtype: object

文字列データとなっています。Pandas.DataFrameの様にdescribeも普通に使える様です。

ではいきなりですが，ここから学習を行います。探索方法も色々とチューニングができるのですが，特に設定しなくとも以下のコマンドだけで実行できます。パラメータなどの設定方法は別のチュートリアルで解説しています。

result_dir = 'agModels-predictClass' # specifies folder where to store trained models
predictor = task.fit(train_data=train_data, label=label_column, output_directory=result_dir)

実行のためのコードは上記のみです。結果が格納されるディレクトリを定義して，あとは学習データとラベルとなるカラムをぶち込んでfitを行うだけです！すごく簡単！
この実行により以下の様なログが出て来ます。

Beginning AutoGluon training ...
Preprocessing data ...
Here are the first 10 unique label values in your data:  [' <=50K' ' >50K']
AutoGluon infers your prediction problem is: binary  (because only two unique label-values observed)
If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: accuracy
To change this, specify the eval_metric argument of fit()
/opt/conda/lib/python3.7/imp.py:342: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return _load(spec)
Fitting model: RandomForestClassifierGini ...
	1.23s	 = Training runtime
	0.8775	 = Validation accuracy score
Fitting model: RandomForestClassifierEntr ...
	1.3s	 = Training runtime
	0.87	 = Validation accuracy score
Fitting model: ExtraTreesClassifierGini ...
	1.0s	 = Training runtime
	0.845	 = Validation accuracy score
Fitting model: ExtraTreesClassifierEntr ...
	1.11s	 = Training runtime
	0.8425	 = Validation accuracy score
Fitting model: KNeighborsClassifierUnif ...
	0.01s	 = Training runtime
	0.78	 = Validation accuracy score
Fitting model: KNeighborsClassifierDist ...
	0.01s	 = Training runtime
	0.7325	 = Validation accuracy score
Fitting model: LightGBMClassifier ...
	1.02s	 = Training runtime
	0.8775	 = Validation accuracy score
Fitting model: CatboostClassifier ...
	18.29s	 = Training runtime
	0.8725	 = Validation accuracy score
Fitting model: NeuralNetClassifier ...
	13.75s	 = Training runtime
	0.87	 = Validation accuracy score
Fitting model: LightGBMClassifierCustom ...
	3.82s	 = Training runtime
	0.855	 = Validation accuracy score
Fitting model: weighted_ensemble_l1 ...
	0.57s	 = Training runtime
	0.88	 = Validation accuracy score
AutoGluon training complete, total runtime = 47.42s ...

Table DataではRandom ForestやLightGBMなどのモデルがデフォルトで設定されている様です。Neural Networkも利用されている様ですが，私の環境ではGPUは使用されませんでした（画像チュートリアルでやったときは自動で使ってくれました）。ログを見ると最後にweighted_ensembleまでやってくれている様ですね。
自動でマルチスレッド処理してくれるらしく，47秒程度で終わりました。

ここで学習されたモデルを利用してテストスコアを算出するためにテストデータをダウンロードします。

test_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label_column]  # values to predict
test_data_nolab = test_data.drop(labels=[label_column],axis=1)

実行フォルダからモデルを読み込んで評価を行います。モデルの読み込みはtask.loadだけなので非常に簡単です。auxiliary_metricsをTrueにすることで複数の評価指標を自動で算出してくれる様です。

# unnecessary, just demonstrates how to load previously-trained predictor from file
predictor = task.load(result_dir) 

y_pred = predictor.predict(test_data_nolab)
print("Predictions:  ", y_pred)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)

結果は以下の様になります。

Evaluation: accuracy on test data: 0.850138
Predictions:   [' <=50K' ' <=50K' ' >50K' ... ' >50K' ' <=50K' ' <=50K']
Evaluations on test data:
{
    "accuracy": 0.8501381922407616,
    "accuracy_score": 0.8501381922407616,
    "balanced_accuracy_score": 0.7614815413534661,
    "matthews_corrcoef": 0.5627555165161592,
    "f1_score": 0.8501381922407616
}
Detailed (per-class) classification report:
{
    " <=50K": {
        "precision": 0.8801269841269841,
        "recall": 0.9302107099718159,
        "f1-score": 0.9044760537648442,
        "support": 7451
    },
    " >50K": {
        "precision": 0.7254487856388595,
        "recall": 0.5927523727351165,
        "f1-score": 0.6524216524216525,
        "support": 2318
    },
    "accuracy": 0.8501381922407616,
    "macro avg": {
        "precision": 0.8027878848829217,
        "recall": 0.7614815413534661,
        "f1-score": 0.7784488530932483,
        "support": 9769
    },
    "weighted avg": {
        "precision": 0.8434247562535608,
        "recall": 0.8501381922407616,
        "f1-score": 0.8446682840531522,
        "support": 9769
    }
}

様々な評価指標が算出されており，evaluate_predictionsメソッドはかなり優秀に感じました。Prediction自体も元々のクラスの表記の通り文字列に自動で直してくれるのも良いですね。

また，以下のコマンドでどの様に探索を行ったか見ることができます。

results = predictor.fit_summary()

結果は以下です。

*** Summary of fit() ***
Number of models trained: 11
Types of models trained: 
{'LGBModel', 'RFModel', 'TabularNeuralNetModel', 'WeightedEnsembleModel', 'CatboostModel', 'KNNModel'}
Validation performance of individual models: {'RandomForestClassifierGini': 0.8775, 'RandomForestClassifierEntr': 0.87, 'ExtraTreesClassifierGini': 0.845, 'ExtraTreesClassifierEntr': 0.8425, 'KNeighborsClassifierUnif': 0.78, 'KNeighborsClassifierDist': 0.7325, 'LightGBMClassifier': 0.8775, 'CatboostClassifier': 0.8725, 'NeuralNetClassifier': 0.87, 'LightGBMClassifierCustom': 0.855, 'weighted_ensemble_l1': 0.88}
Best model (based on validation performance): weighted_ensemble_l1
Hyperparameter-tuning used: False
Bagging used: False 
Stack-ensembling used: False 
User-specified hyperparameters:
{'NN': {'num_epochs': 500}, 'GBM': {'num_boost_round': 10000}, 'CAT': {'iterations': 10000}, 'RF': {'n_estimators': 300}, 'XT': {'n_estimators': 300}, 'KNN': {}, 'custom': ['GBM']}
Plot summary of models saved to file: SummaryOfModels.html
*** End of fit() summary ***

全てのモデルのValidation Scoreや最終的に選ばれたモデルが閲覧できます。さらに，ここを見る通りHPOやBaggingなども設定できる様です。さらに，ログにも出ていますが，結果自体はSummaryOfModels.htmlに以下の様なbokehを利用した動的な可視化が行われます。hoverでパラメータや結果が出るあたり，とてもかっこいいですね。

また，モデルのタスクとカラムの認識を以下のコマンドで見ることができます。

print("AutoGluon infers problem type is: ", predictor.problem_type)
print("AutoGluon categorized the features as: ", predictor.feature_types)

AutoGluon infers problem type is:  binary
AutoGluon categorized the features as:  {'nlp': [], 'vectorizers': [], 'int': ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'], 'object': ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']}

特に何も指定していませんが，きちんと二値分類であること，数値，カテゴリカルデータを分類できています。

Regression

次は回帰問題を解いています。とはいえやることはほとんど変わりません。チュートリアルでは年齢を回帰しています。

age_column = 'age'
print("Summary of age variable: \n", train_data[age_column].describe())

Summary of age variable: 
 count    2000.000000
mean       38.375000
std        13.781195
min        17.000000
25%        27.000000
50%        37.000000
75%        47.000000
max        90.000000
Name: age, dtype: float64

また同様に実行してみます。チュートリアルでは抜けていた様ですが，problem_typeを設定しないと評価指標がaccuracyになった様なのでここだけ指定しています。problem_typeは'binary', 'multiclass', 'regression'の３種類が存在します。

（チュートリアルでは「regressionかどうかは教えなくて良いよ！」と言ってましたが，バージョンのせいか，私の環境ではlabelをintと認識しながらもclassifierで頑張ろうとしていました。）

また，先ほどはモデルを結果ディレクトリから取り出しましたが，そのままevaluateメソッドで評価を呼び出すことも可能です。

predictor_age = task.fit(train_data=train_data, 
                         output_directory="agModels-predictAge", 
                         label=age_column,
                         problem_type='regression')
performance = predictor_age.evaluate(test_data)

実行結果は以下です。

Beginning AutoGluon training ...
Preprocessing data ...
	Data preprocessing and feature engineering runtime = 0.11s ...
AutoGluon will gauge predictive performance using evaluation metric: root_mean_squared_error
To change this, specify the eval_metric argument of fit()
Fitting model: RandomForestRegressorMSE ...
	0.96s	 = Training runtime
	-11.08	 = Validation root_mean_squared_error score
Fitting model: ExtraTreesRegressorMSE ...
	0.76s	 = Training runtime
	-11.2898	 = Validation root_mean_squared_error score
Fitting model: KNeighborsRegressorUnif ...
	0.01s	 = Training runtime
	-14.5251	 = Validation root_mean_squared_error score
Fitting model: KNeighborsRegressorDist ...
	0.01s	 = Training runtime
	-15.2104	 = Validation root_mean_squared_error score
Fitting model: LightGBMRegressor ...
	1.02s	 = Training runtime
	-10.311	 = Validation root_mean_squared_error score
Fitting model: CatboostRegressor ...
	16.18s	 = Training runtime
	-10.4003	 = Validation root_mean_squared_error score
Fitting model: NeuralNetRegressor ...
	16.2s	 = Training runtime
	-10.6256	 = Validation root_mean_squared_error score
Fitting model: LightGBMRegressorCustom ...
	3.57s	 = Training runtime
	-10.5824	 = Validation root_mean_squared_error score
Fitting model: weighted_ensemble_l1 ...
	0.51s	 = Training runtime
	-10.1729	 = Validation root_mean_squared_error score
AutoGluon training complete, total runtime = 46.17s ...
Predictive performance on given dataset: root_mean_squared_error = 10.283879342369117

回帰モデルがいくつか学習されている様です。RMSEなのに何故かマイナスが付いているのは？ですが，学習は行われている様です。evaluateの値は正の値でRMSEが表示されています。おそらくValidation Scoreはマイナスを取り除いた値が正しい値なのかもしれません。

こちらも結果を覗いてみます。

results = predictor_age.fit_summary()

*** Summary of fit() ***
Number of models trained: 9
Types of models trained: 
{'LGBModel', 'RFModel', 'TabularNeuralNetModel', 'WeightedEnsembleModel', 'CatboostModel', 'KNNModel'}
Validation performance of individual models: {'RandomForestRegressorMSE': -11.07999737891064, 'ExtraTreesRegressorMSE': -11.289767107134377, 'KNeighborsRegressorUnif': -14.525081755363718, 'KNeighborsRegressorDist': -15.210358581022941, 'LightGBMRegressor': -10.311005886168017, 'CatboostRegressor': -10.40033162921078, 'NeuralNetRegressor': -10.625604390929084, 'LightGBMRegressorCustom': -10.582398535859292, 'weighted_ensemble_l1': -10.172923924954961}
Best model (based on validation performance): weighted_ensemble_l1
Hyperparameter-tuning used: False
Bagging used: False 
Stack-ensembling used: False 
User-specified hyperparameters:
{'NN': {'num_epochs': 500}, 'GBM': {'num_boost_round': 10000}, 'CAT': {'iterations': 10000}, 'RF': {'n_estimators': 300}, 'XT': {'n_estimators': 300}, 'KNN': {}, 'custom': ['GBM']}
Plot summary of models saved to file: SummaryOfModels.html
*** End of fit() summary ***

こちらも分類問題と同様のモデル群が利用されている様です。アンサンブルはやはり強い。

補足

こちらの発展的チュートリアルにもある通り，fitに渡すパラメータをユーザ側で調整することも可能となります。例えば上の章でも触れたHPOもそれぞれのパラメータ範囲を設定できますし，HPOの種類，時間など全てコントロールできます。

hp_tune = True  # whether or not to do hyperparameter optimization

nn_options = { # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10, # number of training epochs (controls training time of NN models)
    'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry)
    'layers': ag.space.Categorical([100],[1000],[200,100],[300,200,100]),
      # Each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use
    'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter)
}

gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models)
    'num_leaves': ag.space.Int(lower=26, upper=66, default=36), # number of leaves in trees (integer hyperparameter)
}

hyperparameters = {'NN': nn_options, 'GBM': gbm_options}  # hyperparameters of each model type
# If one of these keys is missing from hyperparameters dict, then no models of that type are trained.

time_limits = 2*60  # train various models for ~2 min
num_trials = 5  # try at most 3 different hyperparameter configurations for each type of model
search_strategy = 'skopt'  # to tune hyperparameters using SKopt Bayesian optimization routine
output_directory = 'agModels-predictOccupation'  # folder where to store trained models

predictor = task.fit(train_data=train_data, tuning_data=val_data, label=label_column,
                     output_directory=output_directory, time_limits=time_limits, num_trials=num_trials,
                     hyperparameter_tune=hp_tune, hyperparameters=hyperparameters,
                     search_strategy=search_strategy)

For Image Data

Classification

次はFashion-MNISTで試してみます。公式に沿ってデータをダウンロードしてきます。
画像データなのでImageClassificationをImportしてきます。

import autogluon as ag
from autogluon import ImageClassification as img_task

filename = ag.download('https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip')
ag.unzip(filename)

チュートリアルでダウンロードされるのは４種類だけのようです。フォルダ構成はいつもの

data
  |
  |-- train
  |     |- BabyPants<class名>
  |                |- BabyPants_1009.png<データ>
  |                |- BabyPants_103.png
  |                |- ...
  |                ...
  |- test
      |- ...

という構成なので，他のデータでも問題なく行けそうな雰囲気ですね。

さらに，学習データとテストデータのフォルダを指定しておきます。

dataset = img_task.Dataset('data/train')
test_dataset = img_task.Dataset('data/test', train=False)

ここでからもう学習に入れるのですが，fitの引数をちょっとみてみましょう。

>>> img_task.fit?

task.fit(
    dataset,
    net=Categorical['ResNet50_v1b', 'ResNet18_v1b'],
    optimizer=AutoGluonObject -- SGD,
    lr_scheduler='cosine',
    loss=AutoGluonObject -- SoftmaxCrossEntropyLoss,
    split_ratio=0.8,
    batch_size=64,
    input_size=224,
    epochs=20,
    metric='accuracy',
    nthreads_per_trial=4,
    ngpus_per_trial=1,
    hybridize=True,
    search_strategy='random',
    plot_results=False,
    verbose=False,
    search_options={},
    time_limits=None,
    resume=False,
    checkpoint='checkpoint/exp1.ag',
    visualizer='none',
    num_trials=2,
    dist_ip_addrs=[],
    grace_period=None,
    auto_search=True,
)

net引数で有名モデルを引っ張ってこれそうですし，auto_searchや可視化のオプションもあるようですね。

ともかくデフォルトで学習を行ってみます。

classifier = img_task.fit(dataset,
                      epochs=10,
                      ngpus_per_trial=1,
                      verbose=False)

ngpus_per_trialの引数で各GPUでいくつのジョブを回すかを設定できます。私の環境はGPU２枚積みだったので２つの学習ジョブを並列に探索することができました。時間は少々かかりますが，以下のようなログが出てきます。

Starting Experiments
Num of Finished Tasks is 0
Num of Pending Tasks is 2
100%
2/2 [02:18<00:00, 69.14s/it]


Finished Task with config: {'net.choice': 1, 'optimizer.learning_rate': 0.00837399675820167, 'optimizer.wd': 0.0006137879489315822} and reward: 0.85625

Finished Task with config: {'net.choice': 0, 'optimizer.learning_rate': 0.0031622777, 'optimizer.wd': 0.0003162278} and reward: 0.69375

auto_search=Trueになっているのでちゃんと探索をしてますね。
Test Accuracyをみてみましょう。

test_acc = classifier.evaluate(test_dataset)
print('Top-1 test acc: %.3f' % test_acc)

accuracy: 0.8125: 100%
1/1 [00:03<00:00, 3.92s/it]

Top-1 test acc: 0.812

良さそう。

まとめ

今回はまず速報としてテーブルデータに関するチュートリアルを軽くまとめさせていただきました。印象としては内容がほとんど隠蔽されており，AutoMLとしても，単純なHPOとしてもかなり使いやすいと感じました。これ以外にも画像分類，Object Detection，Text Classificationにも対応しており，さらにNAS(!)も対応しているようです。

ここで触れなかった特徴量エンジニアリングですが，Githubのコードを見る限り最低限の作成はしているようですね。実際に試してはいませんが，nlp系のタスクは基本処理を行う関数は用意されているようでした。

今後新たなモデルが提案されるにつれてAWSはこちらのAutoMLモデルに組み込んで利用していくのでしょうか。さらに検証したらまた新たな記事を投稿しようと思います（MXNetユーザが増えることを祈って...）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up