More than 3 years have passed since last update.

Numpyのみでカーネル回帰

Last updated at 2020-09-30Posted at 2020-09-30

　カーネル法は，非線形データ解析に対する強力な武器です．ソフトマージンSVM・ガウス過程・クラスタリングなどのアルゴリズムの基本要素として頻出します．このポストでは，カーネル法を使って回帰問題を解く手続きを，Pythonで再現してみました．

TL;DR
Please check this repository and see codes.

※ なお，カーネル法に関する知識は，以下の本を参考にしています．

カーネル回帰とは？

カーネル関数とグラム行列

$n$ 個のデータサンプル $\mathcal{D} = \left\{ x^{(i)}, y^{(i)} \right\} \, (i=1, \dots n)$ から，関数 $ y = f(x)$を近似します．回帰モデルはカーネル関数$k(\cdot,\cdot)$とパラメータ${\alpha}_i \, (i=1 \dots n)$ を用いて以下の式で表されます．

$$
\hat{y}^{(i)} = \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) ~~~~ (i=1 \dots n)
$$

推定値$ \hat{y}^{(i)}$と真値$y^{(i)}$の間の距離を残差(residual)として計算します．データサンプルにモデルをfitさせて，残差の合計(あるいは平均)をモデルの損失関数として導出し，最適なパラメータ$ {\alpha}_i \, (i=1 \dots n)$ を求めます．

$$
\begin{align}
residual(y^{(i)}, \hat{y}^{(i)}) &= y^{(i)} - \hat{y}^{(i)} \\
&= y^{(i)} - \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)})
\end{align}
$$

$$
\begin{align}
L(\alpha) &= \frac{1}{n} \sum_{i=1}^{n} {residual(y^{(i)}, \hat{y}^{(i)}) }^{2} \\
&= \frac{1}{n} \sum_{i=1}^{n} { \left( y^{(i)} - \sum_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) \right) }^{2} \\
&= {({\bf y} - {\bf K} {\bf \alpha})}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha})
\end{align}
$$

$$
where, \,\,\, {\bf K} =
\begin{pmatrix}
k(x^{(1)}, x^{(1)}) & \cdots & k(x^{(1)}, x^{(n)}) \\
\vdots & \ddots & \vdots \\
k(x^{(n)}, x^{(1)}) & \cdots & k(x^{(n)}, x^{(n)})
\end{pmatrix}
$$

${\bf K}$ はカーネル関数を与えるとデータサンプルから計算させる行列で，グラム行列(Gram Matrix)と呼びます．

勾配降下法でパラメータ探索

パラメータ$ {\alpha}_i \, (i=1 \dots n)$ の探索には，ナイーブな勾配降下法を用います．係数 $\gamma$ は学習率です.

$$
\frac{\partial L(\alpha)}{\partial {\bf \alpha}} = -2 \cdot {\bf K}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \\
{\alpha}_{new} = {\alpha}_{old} + \gamma \cdot \frac{\partial L(\alpha)}{\partial {\bf \alpha}} \mid_{ \alpha = {\alpha}_{old} }
$$

カーネル関数は，RBF kernel(Gaussian kernel)を採用します．

$$
k(x, x') := \exp \left( {\large - \frac{{|| x - x' ||}^2}{2 {\sigma}^{2}} } \right)
$$

カーネル回帰をPythonで実装 (numpy only)

コードは全て下記のリポジトリにあります．
https://github.com/yumaloop/KernelRegression

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Kernel Function
class RBFkernel():
    def __init__(self, sigma=0.5):
        self.sigma = sigma

    def __call__(self, x, y):
        numerator = -1 * np.sum((x - y)**2)
        denominator = 2 * (self.sigma**2)
        return np.exp(numerator / denominator)

    def get_params(self):
        return self.sigma

    def set_params(self, sigma):
        self.sigma = sigma

# Regression Model
class KernelRegression():
    def __init__(self, kernel):
        self.kernel = kernel

    def fit_kernel(self, X, y, lr=0.01, nb_epoch=1000, log_freq=50):
        self.X = X
        self.y = y
        self.n = X.shape[0] # sample size
        self.alpha = np.full(self.n, 1) # param alpha: initialize
        self.gram_matrix = np.zeros((self.n, self.n))

        # Gradient Descent Algorithm to optimize alpha
        for epoch in range(nb_epoch):

            # Gram Matrix
            for i in range(self.n):
                for j in range(self.n):
                    self.gram_matrix[i][j] = self.kernel(self.X[i], self.X[j])
                    self.loss, self.loss_grad = self.mse(self.X, self.y, self.alpha, self.gram_matrix)
                    self.alpha = self.alpha - lr * self.loss_grad

            if epoch % log_freq == 0:
                print("epoch: {} \t MSE of sample data: {:.4f}".format(epoch, self.loss))


    def mse(self, X, y, alpha, gram_matrix):
        loss = np.dot((y - np.dot(gram_matrix, alpha)), (y - np.dot(gram_matrix, alpha)))
        loss_grad = -2 * np.dot(gram_matrix.T, (y - np.dot(gram_matrix, alpha)))
        return loss, loss_grad

    def predict(self, X_new):
        n_new = X_new.shape[0]
        y_new = np.zeros(n_new)
        for i in range(n_new):
            for j in range(self.n):
                y_new[i] += self.alpha[j] * self.kernel(X_new[i], self.X[j])
        return y_new

# Experiment

def actual_function(x):
    return 1.7 * np.sin(2 * x) + np.cos(1.5 * x) + 0.5 * np.cos(0.5 * x) + 0.5 * x  + 0.1

sample_size = 100 # the number of data sample
var_noise = 0.7 # variance of the gaussian noise for samples

# make data sample
x_sample = np.random.rand(sample_size) * 10 - 5
y_sample = actual_function(x_sample) + np.random.normal(0, var_noise, sample_size)

# variables for plot (actual function)
x_plot = np.linspace(-5, 5, 100)
y_plot = actual_function(x_plot)

# plot figure
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.4, color="blue", label="data sample")
plt.title("Actual function & Data sample (N={})".format(sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

# set kernel
sigma=0.2
kernel = RBFkernel(sigma=sigma)

# generate model
model = KernelRegression(kernel)

# fit data sample for the model
model.fit_kernel(x_sample, y_sample, lr=0.01, nb_epoch=1000, log_freq=100)
  epoch: 0      MSE of sample data: 35.2988
  epoch: 100     MSE of sample data: 30.3570
  epoch: 200     MSE of sample data: 29.4241
  epoch: 300     MSE of sample data: 28.8972
  epoch: 400     MSE of sample data: 28.5597
  epoch: 500     MSE of sample data: 28.3322
  epoch: 600     MSE of sample data: 28.1736
  epoch: 700     MSE of sample data: 28.0598
  epoch: 800     MSE of sample data: 27.9759
  epoch: 900     MSE of sample data: 27.9122

# check Gram Matrix of the model
print(model.gram_matrix)
  array([[1.00000000e+000, 3.24082380e-012, 1.86291046e-092, ...,
        3.45570112e-030, 7.50777061e-016, 6.18611768e-129],
       [3.24082380e-012, 1.00000000e+000, 5.11649994e-039, ...,
        1.78993799e-078, 1.05138357e-053, 1.15421467e-063],
       [1.86291046e-092, 5.11649994e-039, 1.00000000e+000, ...,
        6.88153992e-226, 4.47677881e-182, 8.98951607e-004],
       ...,
       [3.45570112e-030, 1.78993799e-078, 6.88153992e-226, ...,
        1.00000000e+000, 4.28581263e-003, 2.58161981e-281],
       [7.50777061e-016, 1.05138357e-053, 4.47677881e-182, ...,
        4.28581263e-003, 1.00000000e+000, 3.95135557e-232],
       [6.18611768e-129, 1.15421467e-063, 8.98951607e-004, ...,
        2.58161981e-281, 3.95135557e-232, 1.00000000e+000]])

# check loss
print(model.loss)
  12.333081499130776

# array for plot (predicted function)
x_new = np.linspace(-5, 5, 100)
y_new = model.predict(x_new)

 plot result
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.3, color="blue", label="data sample")
plt.plot(x_new, y_new, color="red", label="predicted function")
plt.title("Kernel Regression \n RBF kernel (sigma={}), sample size ={}".format(sigma, sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

以上です。
最後まで読んでいただき、ありがとうございます！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up