概要
機械学習におけるパラメータチューニングには、グリッドサーチがよく使われる。
しかし、このような総当り的手法では計算に時間がかかる。
そこで、自動で効率よく最適なパラメータを選択する手法として、ベイズ最適化が考えられる。
今回、SVMによる分類問題にグリッドサーチとベイズ最適化を適用し、比較を行った。
結果
手法 | 計算時間(秒) | 試行回数 | F1-Score |
---|---|---|---|
ベイズ最適化 | 12.902 | 20 | 0.99 |
グリッドサーチ | 41.947 | 132 | 0.98 |
結論
同等の予測精度を得るのに要した時間は、ベイズ最適化の方が短い。ベイズ最適化の効率の良さが実際に確認できた。
今後の課題
データ
scikit-learnが提供するきれいなサンプルデータ(digits)を用いた。実データでも試してみたい。
アルゴリズム
ディープラーニングのような多大な計算時間を要するアルゴリズムでこそ、ベイズ最適化の真価が発揮される。実際に試してみたい。
ライブラリ選定
今回はPythonで手軽に使えそうという理由でGPyOptを選択。しかし、他にも色々な候補があるため、比較検討の余地あり。
・skopt
・BayesianOptimization
・Spearmint
・MOE
etc...
備考
カテゴリー変数
ベイズ最適化では連続変数はもちろん、離散変数も扱うことができる。
カテゴリー変数の場合は、ダミー変数にするなどの前処理で対応するのが妥当に思える。
※今回は、単に場合分けを行った。
GPyOpt: mixing different types of variables
実行環境
minicondaをインストール後、下記コマンドにより環境構築。
conda create -n myenv --file conda.req
source activate myenv
pip install -r pip.req
matplotlib=2.0.2
numpy=1.12.1
python=3.5.3
scikit-learn=0.18.2
scipy=0.19.1
GPy==1.7.7
GPyOpt==1.0.3
実行結果
time python bayes_opt.py
# Tuning hyper-parameters for f1
(...省略...)
20 experiments were performed.
Best parameters set found on development set:
{'kernel': 'rbf', 'gamma': 0.0011320544666204339, 'C': 604.8637015078541}
Detailed classification report:
The model is trained on the full development set.
The scores are computed on the full evaluation set.
precision recall f1-score support
0 1.00 1.00 1.00 89
1 0.97 1.00 0.98 90
2 1.00 0.98 0.99 92
3 1.00 1.00 1.00 93
4 1.00 1.00 1.00 76
5 0.99 0.98 0.99 108
6 0.99 1.00 0.99 89
7 0.99 1.00 0.99 78
8 1.00 0.98 0.99 92
9 0.99 0.99 0.99 92
avg / total 0.99 0.99 0.99 899
real 0m12.902s
user 0m0.000s
sys 0m0.031s
time python grid_search.py
# Tuning hyper-parameters for f1
(...省略...)
132 experiments were performed.
Best parameters set found on development set:
{'kernel': 'rbf', 'gamma': 0.00016666666666666666, 'C': 100}
Detailed classification report:
The model is trained on the full development set.
The scores are computed on the full evaluation set.
precision recall f1-score support
0 1.00 1.00 1.00 89
1 0.95 1.00 0.97 90
2 0.99 0.99 0.99 92
3 0.97 0.99 0.98 93
4 1.00 1.00 1.00 76
5 0.96 0.97 0.97 108
6 0.99 0.99 0.99 89
7 0.99 1.00 0.99 78
8 1.00 0.91 0.95 92
9 0.97 0.96 0.96 92
avg / total 0.98 0.98 0.98 899
real 0m41.947s
user 0m0.000s
sys 0m0.031s
コード
import GPyOpt
import numpy as np
from numpy.random import seed
from sklearn import datasets
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
from sklearn.svm import SVC
seed(0)
digits = datasets.load_digits()
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
kernel2domains = {
'rbf':[
{'name': 'C', 'type': 'continuous', 'domain': (1, 1100)},
{'name': 'gamma', 'type': 'continuous', 'domain': (0.00005, 0.0015)}
],
'linear': [
{'name': 'C', 'type': 'continuous', 'domain': (1, 1100)},
],
}
def _get_model(param, kernel):
_param = _refine_param(param, kernel)
model = SVC(**_param)
return model
def _refine_param(param, kernel):
assert kernel in ['rbf', 'linear']
if kernel == 'rbf':
ret = {'kernel': kernel, 'C': param[0], 'gamma': param[1]}
else:
ret = {'kernel': kernel, 'C': param[0]}
return ret
def _optimize(params, kernel):
scores = np.zeros((params.shape[0], 1))
for i, param in enumerate(params):
model = _get_model(param, kernel)
y_pred = cross_val_predict(model, X_train, y_train, cv=5)
scores[i] -= f1_score(y_train, y_pred, average='macro')
return scores
print("# Tuning hyper-parameters for f1")
print()
bests_per_kernel = []
for k, d in kernel2domains.items():
f = lambda x: _optimize(x, k)
opt = GPyOpt.methods.BayesianOptimization(f=f, domain=d)
opt.run_optimization(max_iter=15)
idx = np.argmin(opt.Y)
x_best = opt.X[idx]
best_score = opt.Y[idx]
bests_per_kernel.append((x_best, best_score, opt, k))
x_best, _, optimizer, kernel = min(bests_per_kernel, key=lambda x: x[1])
print("Grid scores on development set:")
print()
for param, score in zip(optimizer.X, optimizer.Y):
_score = -score
_param = _refine_param(param, kernel)
print("%0.3f for %r" % (_score, _param))
print()
print("%d experiments were performed." % len(optimizer.X))
print()
print("Best parameters set found on development set:")
print()
print(_refine_param(x_best, kernel))
print()
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
clf = _get_model(x_best, kernel)
clf.fit(X_train, y_train)
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
digits = datasets.load_digits()
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
C = [1, 10, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
_gamma = [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
gamma = [1/i for i in _gamma]
tuned_parameters = [{'kernel': ['rbf'], 'gamma': gamma, 'C': C},
{'kernel': ['linear'], 'C': C}]
print("# Tuning hyper-parameters for f1")
print()
clf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5, scoring='f1_macro')
clf.fit(X_train, y_train)
print("Grid scores on development set:")
print()
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
print("%0.3f (+/-%0.03f) for %r"
% (mean, std * 2, params))
print()
print("%d experiments were performed." % len(means))
print()
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()