カーネル法は,非線形データ解析に対する強力な武器です.ソフトマージンSVM・ガウス過程・クラスタリングなどのアルゴリズムの基本要素として頻出します.このポストでは,カーネル法を使って回帰問題を解く手続きを,Pythonで再現してみました.
TL;DR
Please check this repository and see codes.
※ なお,カーネル法に関する知識は,以下の本を参考にしています.
カーネル回帰とは?
カーネル関数とグラム行列
$n$ 個のデータサンプル $\mathcal{D} = \left\{ x^{(i)}, y^{(i)} \right\} , (i=1, \dots n)$ から,関数 $ y = f(x)$を近似します.回帰モデルはカーネル関数$k(\cdot,\cdot)$とパラメータ${\alpha}_i , (i=1 \dots n)$ を用いて以下の式で表されます.
$$
\hat{y}^{(i)} = \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) ~~~~ (i=1 \dots n)
$$
推定値$ \hat{y}^{(i)}$と真値$y^{(i)}$の間の距離を残差(residual)として計算します.データサンプルにモデルをfitさせて,残差の合計(あるいは平均)をモデルの損失関数として導出し,最適なパラメータ$ {\alpha}_i , (i=1 \dots n)$ を求めます.
$$
\begin{align}
residual(y^{(i)}, \hat{y}^{(i)}) &= y^{(i)} - \hat{y}^{(i)} \
&= y^{(i)} - \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)})
\end{align}
$$
$$
\begin{align}
L(\alpha) &= \frac{1}{n} \sum_{i=1}^{n} {residual(y^{(i)}, \hat{y}^{(i)}) }^{2} \
&= \frac{1}{n} \sum_{i=1}^{n} { \left( y^{(i)} - \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) \right) }^{2} \
&= {({\bf y} - {\bf K} {\bf \alpha})}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha})
\end{align}
$$
$$
where, ,,, {\bf K} =
\begin{pmatrix}
k(x^{(1)}, x^{(1)}) & \cdots & k(x^{(1)}, x^{(n)}) \
\vdots & \ddots & \vdots \
k(x^{(n)}, x^{(1)}) & \cdots & k(x^{(n)}, x^{(n)})
\end{pmatrix}
$$
${\bf K}$ はカーネル関数を与えるとデータサンプルから計算させる行列で,グラム行列(Gram Matrix)と呼びます.
勾配降下法でパラメータ探索
パラメータ$ {\alpha}_i , (i=1 \dots n)$ の探索には,ナイーブな勾配降下法を用います.係数 $\gamma$ は学習率です.
$$
\frac{\partial L(\alpha)}{\partial {\bf \alpha}} = -2 \cdot {\bf K}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \
{\alpha}_{new} = {\alpha}_{old} + \gamma \cdot \frac{\partial L(\alpha)}{\partial {\bf \alpha}} \mid_{ \alpha = {\alpha}_{old} }
$$
カーネル関数は,RBF kernel(Gaussian kernel)を採用します.
$$
k(x, x') := \exp \left( {\large - \frac{{|| x - x' ||}^2}{2 {\sigma}^{2}} } \right)
$$
カーネル回帰をPythonで実装 (numpy only)
コードは全て下記のリポジトリにあります.
https://github.com/yumaloop/KernelRegression
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Kernel Function
class RBFkernel():
def __init__(self, sigma=0.5):
self.sigma = sigma
def __call__(self, x, y):
numerator = -1 * np.sum((x - y)**2)
denominator = 2 * (self.sigma**2)
return np.exp(numerator / denominator)
def get_params(self):
return self.sigma
def set_params(self, sigma):
self.sigma = sigma
# Regression Model
class KernelRegression():
def __init__(self, kernel):
self.kernel = kernel
def fit_kernel(self, X, y, lr=0.01, nb_epoch=1000, log_freq=50):
self.X = X
self.y = y
self.n = X.shape[0] # sample size
self.alpha = np.full(self.n, 1) # param alpha: initialize
self.gram_matrix = np.zeros((self.n, self.n))
# Gradient Descent Algorithm to optimize alpha
for epoch in range(nb_epoch):
# Gram Matrix
for i in range(self.n):
for j in range(self.n):
self.gram_matrix[i][j] = self.kernel(self.X[i], self.X[j])
self.loss, self.loss_grad = self.mse(self.X, self.y, self.alpha, self.gram_matrix)
self.alpha = self.alpha - lr * self.loss_grad
if epoch % log_freq == 0:
print("epoch: {} \t MSE of sample data: {:.4f}".format(epoch, self.loss))
def mse(self, X, y, alpha, gram_matrix):
loss = np.dot((y - np.dot(gram_matrix, alpha)), (y - np.dot(gram_matrix, alpha)))
loss_grad = -2 * np.dot(gram_matrix.T, (y - np.dot(gram_matrix, alpha)))
return loss, loss_grad
def predict(self, X_new):
n_new = X_new.shape[0]
y_new = np.zeros(n_new)
for i in range(n_new):
for j in range(self.n):
y_new[i] += self.alpha[j] * self.kernel(X_new[i], self.X[j])
return y_new
# Experiment
def actual_function(x):
return 1.7 * np.sin(2 * x) + np.cos(1.5 * x) + 0.5 * np.cos(0.5 * x) + 0.5 * x + 0.1
sample_size = 100 # the number of data sample
var_noise = 0.7 # variance of the gaussian noise for samples
# make data sample
x_sample = np.random.rand(sample_size) * 10 - 5
y_sample = actual_function(x_sample) + np.random.normal(0, var_noise, sample_size)
# variables for plot (actual function)
x_plot = np.linspace(-5, 5, 100)
y_plot = actual_function(x_plot)
# plot figure
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.4, color="blue", label="data sample")
plt.title("Actual function & Data sample (N={})".format(sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
# set kernel
sigma=0.2
kernel = RBFkernel(sigma=sigma)
# generate model
model = KernelRegression(kernel)
# fit data sample for the model
model.fit_kernel(x_sample, y_sample, lr=0.01, nb_epoch=1000, log_freq=100)
epoch: 0 MSE of sample data: 35.2988
epoch: 100 MSE of sample data: 30.3570
epoch: 200 MSE of sample data: 29.4241
epoch: 300 MSE of sample data: 28.8972
epoch: 400 MSE of sample data: 28.5597
epoch: 500 MSE of sample data: 28.3322
epoch: 600 MSE of sample data: 28.1736
epoch: 700 MSE of sample data: 28.0598
epoch: 800 MSE of sample data: 27.9759
epoch: 900 MSE of sample data: 27.9122
# check Gram Matrix of the model
print(model.gram_matrix)
array([[1.00000000e+000, 3.24082380e-012, 1.86291046e-092, ...,
3.45570112e-030, 7.50777061e-016, 6.18611768e-129],
[3.24082380e-012, 1.00000000e+000, 5.11649994e-039, ...,
1.78993799e-078, 1.05138357e-053, 1.15421467e-063],
[1.86291046e-092, 5.11649994e-039, 1.00000000e+000, ...,
6.88153992e-226, 4.47677881e-182, 8.98951607e-004],
...,
[3.45570112e-030, 1.78993799e-078, 6.88153992e-226, ...,
1.00000000e+000, 4.28581263e-003, 2.58161981e-281],
[7.50777061e-016, 1.05138357e-053, 4.47677881e-182, ...,
4.28581263e-003, 1.00000000e+000, 3.95135557e-232],
[6.18611768e-129, 1.15421467e-063, 8.98951607e-004, ...,
2.58161981e-281, 3.95135557e-232, 1.00000000e+000]])
# check loss
print(model.loss)
12.333081499130776
# array for plot (predicted function)
x_new = np.linspace(-5, 5, 100)
y_new = model.predict(x_new)
plot result
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.3, color="blue", label="data sample")
plt.plot(x_new, y_new, color="red", label="predicted function")
plt.title("Kernel Regression \n RBF kernel (sigma={}), sample size ={}".format(sigma, sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
以上です。 最後まで読んでいただき、ありがとうございます!