More than 1 year has passed since last update.

JuliaでDeep Learningを理解する: 3.2 Shallow neural networks II

Last updated at 2024-03-11Posted at 2024-03-11

Understanding Deep LearningのノートブックをJuliaで確認する。

3.2 Shallow neural networks II

JuliaでDeep Learningを理解する: 3.1 Shallow neural networks Iでは以下の1次元の浅いニューラルネットワークを確認した。

(画像はudlbookより引用)

入力2次元、出力1次元

2次元の浅いニューラルネットワークを確認する。

(画像はudlbookより引用)
式で表すと

\begin{align*}
h_1 &= a[\theta_{10} + \theta_{11}x_1 + \theta_{12}x_2] \\
h_2 &= a[\theta_{20} + \theta_{21}x_1 + \theta_{22}x_2] \\
h_3 &= a[\theta_{30} + \theta_{31}x_1 + \theta_{32}x_2]
\end{align*}

y = \phi_0 + \phi_1h_1 + \phi_2h_2 + \phi_3h_3

となる。
Juliaで実装する。

function shallow(x, activation_fn, phi, theta)
    pre = theta * [1; x]
    act = activation_fn.(pre)
    w_act = phi' .* [1; act]
    y = sum(w_act)
    y, pre, act, w_act
end
shallow(x1, x2, activation_fn, phi, theta) = shallow([x1, x2], activation_fn, phi, theta)

JuliaでDeep Learningを理解する: 3.1 Shallow neural networks Iで実装したshallowのphiが行列に対応するように転置した。他は変更してない。

using CairoMakie
phi = [
    0.0 -2.0 2.0 1.5
]
theta = [
    -4.0 0.9 0.0
    5.0 -0.9 -0.5
    -7 0.5 0.9
]
ReLU(preactivation) = ifelse(preactivation < 0, zero(preactivation), preactivation)
results = shallow.(x1', x2, ReLU, phi |> Ref, theta |> Ref)
ys = getindex.(results, 1)
fig = Figure()
ax = Axis(fig[1, 1], xlabel="x1", ylabel="x2", title="y=ϕ₀+ϕ₁h₁+ϕ₂h₂+ϕ₃h₃")
co = contourf!(ax, x1, x2, ys, levels=-10:10)
Colorbar(fig[1, 2], co)

活性化関数はReLUをそのまま使う。得られた等高線は境界線がいくつか見える。
足し合わせる前の$\phi_0,\ \phi_1h_1,\ \phi_2h_2,\ \phi_3h_3$をそれぞれ表示してみる。

w_acts = getindex.(results, 4)
fig = Figure()
axs = [Axis(fig[i, j]) for i in 1:2 for j in 1:2]
for i in eachindex(axs)
    contourf!(axs[i], x1, x2, getindex.(w_acts, i), levels=-10:10)
end

境界線は活性化関数により0にされた部分から来ていることがわかる。言い換えると隠れユニット数の線を引いて領域を分割する効果が活性化関数にはある。

Universal approximation theorem

領域が分割されて何が嬉しいのだろうか？普遍近似定理(Universal approximation theorem)によると任意の関数がこの分割により近似可能となるらしい(意訳)。この定理は隠れユニットを増やせば任意の関数に任意の精度で近似可能と言っているが、深さに関しては何も述べていない。そして、Deep Learningにおける深さの効果はまだ数学的に説明されていない。

入力2次元、出力2次元

入力２、出力２のニューラルネットワークを確認する。

function shallow(x, activation_fn, phi, theta)
    pre = theta * [1; x]
    act = activation_fn.(pre)
    w_act = phi' .* [1; act]
    y = sum(w_act, dims=1)
    y, pre, act, w_act
end
phi = [
    0.0 -2.0 2.0 1.5
    -2.0 -1.0 -2.0 0.8
]
theta = [
    -4.0 0.9 0.0
    5.0 -0.9 -0.5
    -7 0.5 0.9
]
x1 = x2 = 0:0.1:10
results = shallow.(x1', x2, ReLU, phi |> Ref, theta |> Ref)

shallow関数はyが２つ返せるようにy = sum(w_act)をy = sum(w_act, dims=1)にした。

ys = getindex.(results, 1)
y1 = getindex.(ys, 1)
y2 = getindex.(ys, 2)
contourf(x1, x2, y1, levels=-10:10)
contourf(x1, x2, y2, levels=-10:10)

y1は

y2は

となる。境界線は同じで値の異なる等高線が得られる。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up