More than 5 years have passed since last update.

GTM (Generative Topographic Mapping) のハイパーパラメータチューニングでベイズ最適化を使った

Last updated at 2018-11-07Posted at 2018-11-05

金子先生が公開されている GTM (Generative Topographic Mapping) では、ハイパーパラメータのチューニングにグリッドサーチを使った例を紹介されていましたが、ベイズ最適化でチューニングしたくなったのでテストコードを書いてみました。こちらからGTMをインストール済みであること前提です。

まず最初はオリジナルと同じです。

import matplotlib.figure as figure
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_iris

from gtm import gtm
from k3nerror import k3nerror

# load an iris dataset
iris = load_iris()
input_dataset = iris.data
color = iris.target

# autoscaling
input_dataset = (input_dataset - input_dataset.mean(axis=0)) / input_dataset.std(axis=0, ddof=1)

ここから、ベイズ最適化用にコードを書き換えました。

ベイズ最適化に関してはこちらを参考にしました。

import GPy, GPyOpt

k_in_k3nerror = 10

# 変数探索範囲
bounds = [
    {'name': 'shape_of_map_grid', 'type': 'discrete', 'domain': np.arange(30, 31, dtype=int)},
    {'name': 'shape_of_rbf_centers_grid', 'type': 'discrete', 'domain': np.arange(2, 22, 2, dtype=int)},
    {'name': 'variance_of_rbfs_grid', 'type': 'discrete', 'domain': np.arange(-5, 4, 2, dtype=float)},  
    {'name': 'lambda_in_em_algorithm_grid', 'type': 'discrete', 'domain': np.arange(-4, 0, dtype=float)},  
    {'name': 'number_of_iterations', 'type': 'discrete', 'domain': np.arange(300, 301, dtype=float)},
]

def gtmf(x):
    shape_of_map_grid = int(x[:,0][0])
    shape_of_rbf_centers_grid = int(x[:,1][0])
    variance_of_rbfs_grid = int(x[:,2][0])
    lambda_in_em_algorithm_grid = int(x[:,3][0])
    number_of_iterations = int(x[:,4][0])
    display_flag = 0
    model = gtm([shape_of_map_grid, shape_of_map_grid],
                            [shape_of_rbf_centers_grid, shape_of_rbf_centers_grid],
                            2 ** variance_of_rbfs_grid, 2 ** lambda_in_em_algorithm_grid, number_of_iterations, display_flag)
    model.fit(input_dataset)
    if model.success_flag:
        # calculate of responsibilities
        responsibilities = model.responsibility(input_dataset)
        # calculate the mean of responsibilities
        means = responsibilities.dot(model.map_grids)
        # calculate k3n-error
        k3nerror_of_gtm = k3nerror(input_dataset, means, k_in_k3nerror)
    else:
        k3nerror_of_gtm = 10 ** 100
    return k3nerror_of_gtm

myBopt = GPyOpt.methods.BayesianOptimization(f=gtmf, domain=bounds)
myBopt.run_optimization(max_iter=300)

print(myBopt.x_opt)
print(-myBopt.fx_opt)

[ 30.   2.   3.  -4. 300.]
-0.6821886073347678

shape_of_map = [int(myBopt.x_opt[0]), int(myBopt.x_opt[0])]
shape_of_rbf_centers = [int(myBopt.x_opt[1]), int(myBopt.x_opt[1])]
variance_of_rbfs = 2 ** myBopt.x_opt[2]
lambda_in_em_algorithm = 2 ** myBopt.x_opt[3]
number_of_iterations = int(myBopt.x_opt[4])
display_flag = 0

この後は、再びオリジナルと同じです。

# construct GTM model
model = gtm(shape_of_map, shape_of_rbf_centers, variance_of_rbfs, lambda_in_em_algorithm, number_of_iterations,
            display_flag)
model.fit(input_dataset)

# calculate of responsibilities
responsibilities = model.responsibility(input_dataset)

# plot the mean of responsibilities
means = responsibilities.dot(model.map_grids)
plt.figure(figsize=figure.figaspect(1))
plt.scatter(means[:, 0], means[:, 1], c=color)
plt.ylim(-1.1, 1.1)
plt.xlim(-1.1, 1.1)
plt.xlabel("z1 (mean)")
plt.ylabel("z2 (mean)")
plt.grid()
plt.show()

print("Optimized hyperparameters")
print("Optimal map size: {0}, {1}".format(shape_of_map[0], shape_of_map[1]))
print("Optimal shape of RBF centers: {0}, {1}".format(shape_of_rbf_centers[0], shape_of_rbf_centers[1]))
print("Optimal variance of RBFs: {0}".format(variance_of_rbfs))
print("Optimal lambda in EM algorithm: {0}".format(lambda_in_em_algorithm))

Optimized hyperparameters
Optimal map size: 30, 30
Optimal shape of RBF centers: 2, 2
Optimal variance of RBFs: 8.0
Optimal lambda in EM algorithm: 0.0625

最適化の履歴

目的関数の最適化の履歴はこのように確認できます。max_iter=300 にしたけど、実際は31回程度で終了したということかな。

plt.plot(myBopt.Y)
plt.ylim([0, 2])
plt.show()

結果

オリジナル（グリッドサーチ版）とほぼ同じ結果が得られました。
計算時間は、オリジナルが約９分３０秒だったのに比べ、ベイズ最適化版は約１分５０秒でした。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up