LoginSignup
9
14

More than 5 years have passed since last update.

ベイズ最適化とグリッドサーチの比較

Last updated at Posted at 2017-07-08

概要

機械学習におけるパラメータチューニングには、グリッドサーチがよく使われる。
しかし、このような総当り的手法では計算に時間がかかる。
そこで、自動で効率よく最適なパラメータを選択する手法として、ベイズ最適化が考えられる。
今回、SVMによる分類問題にグリッドサーチとベイズ最適化を適用し、比較を行った。

結果

手法 計算時間(秒) 試行回数 F1-Score
ベイズ最適化 12.902 20 0.99
グリッドサーチ 41.947 132 0.98

結論

同等の予測精度を得るのに要した時間は、ベイズ最適化の方が短い。ベイズ最適化の効率の良さが実際に確認できた。

今後の課題

データ

scikit-learnが提供するきれいなサンプルデータ(digits)を用いた。実データでも試してみたい。

アルゴリズム

ディープラーニングのような多大な計算時間を要するアルゴリズムでこそ、ベイズ最適化の真価が発揮される。実際に試してみたい。

ライブラリ選定

今回はPythonで手軽に使えそうという理由でGPyOptを選択。しかし、他にも色々な候補があるため、比較検討の余地あり。
skopt
BayesianOptimization
Spearmint
MOE
etc...

備考

カテゴリー変数

ベイズ最適化では連続変数はもちろん、離散変数も扱うことができる。
カテゴリー変数の場合は、ダミー変数にするなどの前処理で対応するのが妥当に思える。
※今回は、単に場合分けを行った。
GPyOpt: mixing different types of variables

実行環境

minicondaをインストール後、下記コマンドにより環境構築。

conda create -n myenv --file conda.req
source activate myenv
pip install -r pip.req
conda.req
matplotlib=2.0.2
numpy=1.12.1
python=3.5.3
scikit-learn=0.18.2
scipy=0.19.1
pip.req
GPy==1.7.7
GPyOpt==1.0.3

実行結果

time python bayes_opt.py
# Tuning hyper-parameters for f1

(...省略...)

20 experiments were performed.

Best parameters set found on development set:

{'kernel': 'rbf', 'gamma': 0.0011320544666204339, 'C': 604.8637015078541}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        89
          1       0.97      1.00      0.98        90
          2       1.00      0.98      0.99        92
          3       1.00      1.00      1.00        93
          4       1.00      1.00      1.00        76
          5       0.99      0.98      0.99       108
          6       0.99      1.00      0.99        89
          7       0.99      1.00      0.99        78
          8       1.00      0.98      0.99        92
          9       0.99      0.99      0.99        92

avg / total       0.99      0.99      0.99       899


real    0m12.902s
user    0m0.000s
sys     0m0.031s

time python grid_search.py
# Tuning hyper-parameters for f1

(...省略...)

132 experiments were performed.

Best parameters set found on development set:

{'kernel': 'rbf', 'gamma': 0.00016666666666666666, 'C': 100}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        89
          1       0.95      1.00      0.97        90
          2       0.99      0.99      0.99        92
          3       0.97      0.99      0.98        93
          4       1.00      1.00      1.00        76
          5       0.96      0.97      0.97       108
          6       0.99      0.99      0.99        89
          7       0.99      1.00      0.99        78
          8       1.00      0.91      0.95        92
          9       0.97      0.96      0.96        92

avg / total       0.98      0.98      0.98       899


real    0m41.947s
user    0m0.000s
sys     0m0.031s

コード

bayes_opt.py
import GPyOpt
import numpy as np
from numpy.random import seed
from sklearn import datasets
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
from sklearn.svm import SVC

seed(0)

digits = datasets.load_digits()

n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

kernel2domains = {
    'rbf':[
        {'name': 'C', 'type': 'continuous', 'domain': (1, 1100)},
        {'name': 'gamma', 'type': 'continuous', 'domain': (0.00005, 0.0015)}
    ],
    'linear': [
        {'name': 'C', 'type': 'continuous', 'domain': (1, 1100)},
    ],
}

def _get_model(param, kernel):
    _param = _refine_param(param, kernel)
    model = SVC(**_param)
    return model

def _refine_param(param, kernel):
    assert kernel in ['rbf', 'linear']
    if kernel == 'rbf':
        ret = {'kernel': kernel, 'C': param[0], 'gamma': param[1]}
    else:
        ret = {'kernel': kernel, 'C': param[0]}
    return ret

def _optimize(params, kernel):
    scores = np.zeros((params.shape[0], 1))
    for i, param in enumerate(params):
        model = _get_model(param, kernel)
        y_pred = cross_val_predict(model, X_train, y_train, cv=5)
        scores[i] -= f1_score(y_train, y_pred, average='macro')
    return scores

print("# Tuning hyper-parameters for f1")
print()

bests_per_kernel = []
for k, d in kernel2domains.items():
    f = lambda x: _optimize(x, k)
    opt = GPyOpt.methods.BayesianOptimization(f=f, domain=d)
    opt.run_optimization(max_iter=15)
    idx = np.argmin(opt.Y)
    x_best = opt.X[idx]
    best_score = opt.Y[idx]
    bests_per_kernel.append((x_best, best_score, opt, k))

x_best, _, optimizer, kernel = min(bests_per_kernel, key=lambda x: x[1])

print("Grid scores on development set:")
print()
for param, score in zip(optimizer.X, optimizer.Y):
    _score = -score
    _param = _refine_param(param, kernel)
    print("%0.3f for %r" % (_score, _param))
print()

print("%d experiments were performed." % len(optimizer.X))
print()

print("Best parameters set found on development set:")
print()
print(_refine_param(x_best, kernel))
print()

print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
clf = _get_model(x_best, kernel)
clf.fit(X_train, y_train)
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()
grid_search.py
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC

digits = datasets.load_digits()

n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

C = [1, 10, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
_gamma = [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
gamma = [1/i for i in _gamma]
tuned_parameters = [{'kernel': ['rbf'], 'gamma': gamma, 'C': C},
                    {'kernel': ['linear'], 'C': C}]

print("# Tuning hyper-parameters for f1")
print()

clf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5, scoring='f1_macro')
clf.fit(X_train, y_train)

print("Grid scores on development set:")
print()
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r"
          % (mean, std * 2, params))
print()

print("%d experiments were performed." % len(means))
print()

print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()

print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()
9
14
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
9
14