More than 5 years have passed since last update.

ベイジアンニューラルネットワークとは

Last updated at 2019-10-31Posted at 2019-10-26

はじめに

ベイジアンニューラルネットワークという単語をよく聞くけど，何か知らなかったので，調べてメモとしてまとめます．

今回は回帰問題でかつ $p(y|x), p(\theta)$ は共に正規分布に従う場合のみ考慮しました．
分類問題では正規分布ではなく，ソフトマックス関数が利用されたりしますが，記述しません．

参照
・Edward - Bayesian Neural Network
・Bayesian Learning for Neural Networks p15

要約

内部パラメータ $\theta$ に事前分布 $p(\theta)$ を敷いたニューラルネットワークのこと．最初に教師データ $\mathcal{D}$ の学習を行うことで，新たな入力 $ x^* $ と内部パラメータから未知の出力 $ y^* $ を予測するニューラルネットワーク $f(x^*; \theta)$ を得る．積分消去を通して，関数表現 $f(x; \theta)$ から内部パラメータ $\theta$ を消去した $f(x|\mathcal{D})$ を得ることが目的．この積分消去において，学習後の事後分布 $p(\theta|\mathcal{D})$ を利用する．

記号の定義

\begin{align}
教師データ:\ & \mathcal{D} = \{(x_n, y_n)\}_{n=1}^{N} \\
& x_n \in \mathbb{R}^d, y_n \in \mathbb{R} \\
ニューラルネットワーク:\ & f(x; \theta) \\
ニューラルネットワークの内部パラメータ:\ & \theta \in \mathbb{R}^M \\

\end{align}

仮定

\begin{align}
内部パラメータの事前分布:\ & p(\theta)\sim  \mathcal{N}({\bf 0}, \sigma^2{\bf I}) \\
出力の事後分布(ニューラルネットワークの出力):\ & p(y|x,\theta,\sigma)\sim \mathcal{N}(f(x; \theta), \sigma^2) \\
出力の事後分布(積分消去後):\ & p(y|x,\mathcal{D},\sigma)\sim \mathcal{N}(f(x|\mathcal{D}), \sigma^2) \\
\sigma:\ & 事前に与える．ただし，推定する方法もある

\end{align}

学習の結果得られるもの

\begin{align}
ニューラルネットワーク:\ & f(x; \theta) \\
ニューラルネットワークの内部パラメータ:\ & \theta \in \mathbb{R}^M \\
出力の事後分布:\ & p(y|x,\theta,\sigma)\sim \mathcal{N}(f(x; \theta), \sigma^2) \\
内部パラメータの事後分布の比例式:\ & p(\theta|\mathcal{D}) \propto p(\theta) \prod_n p(y_n|x_n,\theta)
\end{align}

積分消去後に得られるもの

学習の結果として得た内部パラメータの事後分布の比例式を利用して，モンテカルロサンプリングをするのが一般的．モンテカルロ積分によって，以下のような関数が得られる．一般に少数のデータに対して過剰適合しやすいニューラルネットワークをベイズ学習によって，うまく過剰適合を避けることができました．

\begin{align}
出力の予測値:\ & f(x^*|\mathcal{D}) = \int f(x^*; \theta)p(\theta|\mathcal{D})d\theta \\
出力の事後分布:\ & p(y^*|x^*, \mathcal{D}, \sigma) = \int p(y^*|x^*, \theta, \sigma)p(\theta|\mathcal{D})d\theta \sim \mathcal{N}(f(x^*|\mathcal{D}), \sigma^2)
\end{align}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up