More than 3 years have passed since last update.

株式会社スクラムサイン

Gpytorch Gaussian Process Linear Regression

Last updated at 2020-01-28Posted at 2020-01-28

Gpytorchでガウス過程回帰を検証する

目的　ガウス過程フレームワークの選定で評価が高いGpytorchをとりあえず試して動作可否を試す

またモジュール性が強いフレームワークなので各モジュールの役割などをざっくりと理解する

環境構築手順

GPytorchはPytorch上で動くガウス過程に特化したLibraryで設計思想はPytorchで統一されている。

Python3.6.3以降
pip install matplotlib pandas gpytorch pytorch torchvision jupyterlab
参考:Pytorchをinstallする(https://qiita.com/berry-clione/items/2371faa063ce5e4f4b12)

gpytorchでのLinearRgression

Introduction In this notebook, we demonstrate many of the design features of GPyTorch using the simplest example, training an RBF kernel Gaussian process on a simple function. We’ll be modeling the function

𝑦𝜖=sin(2𝜋𝑥)+𝜖∼N(0,0.2) with 100 training examples, and testing on 51 test examples.

Note: this notebook is not necessarily intended to teach the mathematical background of Gaussian processes, but rather how to train a simple one and make predictions in GPyTorch. For a mathematical treatment, Chapter 2 of Gaussian Processes for Machine Learning provides a very thorough introduction to GP regression (this entire text is highly recommended): http://www.gaussianprocess.org/gpml/chapters/RW2.pdf

# gpytorch regression tutorial
import math
import torch
import gpytorch
from matplotlib import pyplot as plt

%matplotlib inline
%load_ext autoreload
%autoreload 2

Set up training data In the next cell, we set up the training data for this example. We’ll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.

# pytorch 専用の配列
train_x = torch.linspace(0,1,100)
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2

Setting up the model

The next cell demonstrates the most critical features of a user-defined Gaussian process model in GPyTorch. Building a GP model in GPyTorch is different in a number of ways.

First in contrast to many existing GP packages, we do not provide full GP models for the user. Rather, we provide the tools necessary to quickly construct one. This is because we believe, analogous to building a neural network in standard PyTorch, it is important to have the flexibility to include whatever components are necessary. As can be seen in more complicated examples, this allows the user great flexibility in designing custom models.

gpytorchはpytorchと同じ設計思想でgaussian processの計算で必要な部分を分割しモジュール化している
For most GP regression models you will need to construct the following GPyTorch objects:

GP Model

(gpytorch.models.ExactGP) - This handles most of the inference.

Likelihood

(gpytorch.likelihoods.GaussianLikelihood) - This is the most common likelihood used for GP regression.

Mean

This defines the prior mean of the GP.(If you don’t know which mean to use, a gpytorch.means.ConstantMean() is a good place to start.)

Kernel

This defines the prior covariance of the GP.(If you don’t know which kernel to use, a gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()) is a good place to start).

MultivariateNormal Distribution

(gpytorch.distributions.MultivariateNormal) - This is the object used to represent multivariate normal distributions.

The GP Model

The components of a user built (Exact, i.e. non-variational) GP model in GPyTorch are, broadly speaking:

An init method

that takes the training data and a likelihood, and constructs whatever objects are necessary for the model’s forward method.

This will most commonly include things like a mean module and a kernel module. A forward method that takes in some 𝑛×𝑑 data x and returns a MultivariateNormal with the prior mean and covariance evaluated at x. In other words, we return the vector 𝜇(𝑥) and the 𝑛×𝑛 matrix 𝐾𝑥𝑥 representing the prior mean and covariance matrix of the GP. This specification leaves a large amount of flexibility when defining a model. For example, to compose two kernels via addition, you can either add the kernel modules directly:self.covar_module = ScaleKernel(RBFKernel() + WhiteNoiseKernel())

Or you can add the outputs of the kernel in the forward method:

covar_x = self.rbf_kernel_module(x) + self.white_noise_module(x)

# we will use simplest from of GP model , exact inference

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
    
# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

Model modes
Like most PyTorch modules, the ExactGP has a .train() and .eval() mode.

.train() mode is for optimizing model hyperameters.
.eval() mode is for computing predictions through the model posterior.
Training the model
第二種最尤推定
In the next cell, we handle using Type-II MLE to train the hyperparameters of the Gaussian process.

The most obvious difference here compared to many other GP implementations is that, as in standard PyTorch, the core training loop is written by the user.

training loopはuserが書く!
In GPyTorch, we make use of the standard PyTorch optimizers as from torch.optim, and all trainable parameters of the model should be of type torch.nn.Parameter. Because GP models directly extend torch.nn.Module, calls to methods like model.parameters() or model.named_parameters() function as you might expect coming from PyTorch.

parameter推定はpytorchのfunctionを使っている
In most cases, the boilerplate code below will work well. It has the same basic components as the standard PyTorch training loop:

ほとんどの場合で下記のコードの流れになる
Zero all parameter gradients

parameterの勾配をゼロ

Call the model and compute the loss

modelの呼び出しと損失関数の計算

Call backward on the loss to fill in gradients

損失から逆方向に勾配を計算する

Take a step on the optimizer

optimizerはこのような手順を踏んでいる However, defining custom training loops allows for greater flexibility. For example, it is easy to save the parameters at each step of training, or use different learning rates for different parameters (which may be useful in deep kernel learning for example).

# this is running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
training_iter = 2 if smoke_test else 50

# Find optimal model hyperparameters
# modelとlikelihoodを別々でtrainしている
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([{'params':model.parameters()},],lr=0.1)

# "Loss" for GPs the marginal log likelihood
# mean of marginal　周辺化されたという意味で
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iter):
    #zero gradients from previous iteration
    optimizer.zero_grad()
    # output from model
    output = model(train_x)
    # calc loss and backprop gradients
    loss = -mll(output, train_y)
    # backwardだけで何をやっているのか確認
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

Iter 1/50 - Loss: 0.921 lengthscale: 0.693 noise: 0.693
Iter 2/50 - Loss: 0.889 lengthscale: 0.644 noise: 0.644
Iter 3/50 - Loss: 0.855 lengthscale: 0.598 noise: 0.598
Iter 4/50 - Loss: 0.817 lengthscale: 0.555 noise: 0.554
Iter 5/50 - Loss: 0.775 lengthscale: 0.514 noise: 0.513
Iter 6/50 - Loss: 0.729 lengthscale: 0.475 noise: 0.474
Iter 7/50 - Loss: 0.680 lengthscale: 0.438 noise: 0.437
Iter 8/50 - Loss: 0.630 lengthscale: 0.404 noise: 0.402
Iter 9/50 - Loss: 0.583 lengthscale: 0.371 noise: 0.370
Iter 10/50 - Loss: 0.540 lengthscale: 0.342 noise: 0.339
Iter 11/50 - Loss: 0.500 lengthscale: 0.315 noise: 0.311
Iter 12/50 - Loss: 0.463 lengthscale: 0.292 noise: 0.284
Iter 13/50 - Loss: 0.428 lengthscale: 0.273 noise: 0.260
Iter 14/50 - Loss: 0.393 lengthscale: 0.258 noise: 0.237
Iter 15/50 - Loss: 0.359 lengthscale: 0.245 noise: 0.216
Iter 16/50 - Loss: 0.324 lengthscale: 0.235 noise: 0.197
Iter 17/50 - Loss: 0.290 lengthscale: 0.228 noise: 0.179
Iter 18/50 - Loss: 0.257 lengthscale: 0.223 noise: 0.163
Iter 19/50 - Loss: 0.223 lengthscale: 0.219 noise: 0.148
Iter 20/50 - Loss: 0.190 lengthscale: 0.218 noise: 0.135
Iter 21/50 - Loss: 0.158 lengthscale: 0.218 noise: 0.122
Iter 22/50 - Loss: 0.126 lengthscale: 0.219 noise: 0.111
Iter 23/50 - Loss: 0.096 lengthscale: 0.222 noise: 0.101
Iter 24/50 - Loss: 0.067 lengthscale: 0.226 noise: 0.092
Iter 25/50 - Loss: 0.040 lengthscale: 0.230 noise: 0.084
Iter 26/50 - Loss: 0.015 lengthscale: 0.236 noise: 0.076
Iter 27/50 - Loss: -0.008 lengthscale: 0.241 noise: 0.070
Iter 28/50 - Loss: -0.027 lengthscale: 0.248 noise: 0.064
Iter 29/50 - Loss: -0.044 lengthscale: 0.254 noise: 0.058
Iter 30/50 - Loss: -0.058 lengthscale: 0.260 noise: 0.053
Iter 31/50 - Loss: -0.069 lengthscale: 0.265 noise: 0.049
Iter 32/50 - Loss: -0.076 lengthscale: 0.270 noise: 0.045
Iter 33/50 - Loss: -0.081 lengthscale: 0.273 noise: 0.042
Iter 34/50 - Loss: -0.083 lengthscale: 0.275 noise: 0.039
Iter 35/50 - Loss: -0.084 lengthscale: 0.276 noise: 0.037
Iter 36/50 - Loss: -0.083 lengthscale: 0.275 noise: 0.035
Iter 37/50 - Loss: -0.081 lengthscale: 0.272 noise: 0.033
Iter 38/50 - Loss: -0.078 lengthscale: 0.268 noise: 0.031
Iter 39/50 - Loss: -0.076 lengthscale: 0.263 noise: 0.030
Iter 40/50 - Loss: -0.073 lengthscale: 0.256 noise: 0.029
Iter 41/50 - Loss: -0.072 lengthscale: 0.249 noise: 0.029
Iter 42/50 - Loss: -0.070 lengthscale: 0.241 noise: 0.028
Iter 43/50 - Loss: -0.069 lengthscale: 0.234 noise: 0.028
Iter 44/50 - Loss: -0.069 lengthscale: 0.227 noise: 0.028
Iter 45/50 - Loss: -0.069 lengthscale: 0.221 noise: 0.028
Iter 46/50 - Loss: -0.069 lengthscale: 0.216 noise: 0.028
Iter 47/50 - Loss: -0.071 lengthscale: 0.213 noise: 0.028
Iter 48/50 - Loss: -0.073 lengthscale: 0.211 noise: 0.029
Iter 49/50 - Loss: -0.075 lengthscale: 0.210 noise: 0.029
Iter 50/50 - Loss: -0.077 lengthscale: 0.210 noise: 0.030

Make predictions with the model(予測)
In the next cell, we make predictions with the model. To do this, we simply put the model and likelihood in eval mode, and call both modules on the test data.

Just as a user defined GP model returns a MultivariateNormal containing the prior mean and covariance from forward, a trained GP model in eval mode returns a MultivariateNormal containing the posterior mean and covariance. Thus, getting the predictive mean and variance, and then sampling functions from the GP at the given test points could be accomplished with calls like:

f_preds = model(test_x)

y_preds = likelihood(model(test_x))

f_mean = f_preds.mean

f_var = f_preds.variance

f_covar = f_preds.covariance_matrix

f_samples = f_preds.sample(sample_shape=torch.Size(1000,))

The gpytorch.settings.fast_pred_var context is not needed, but here we are giving a preview of using one of our cool features, getting faster predictive distributions using LOVE.


# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(0, 1, 51)
    observed_pred = likelihood(model(test_x))

Plot the model fit
In the next cell, we plot the mean and confidence region of the Gaussian process model. The confidence_region method is a helper method that returns 2 standard deviations above and below the mean.

with torch.no_grad():
    # initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4,3))
    # get upper and lower confidence bounds
    lower , upper = observed_pred.confidence_region()
    # plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    
    ax.legend(['Observed data', 'Mean', 'Confidence'])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up