More than 1 year has passed since last update.

JuliaでDeep Learningを理解する: 1.1 Background Mathematics

Last updated at 2024-03-11Posted at 2024-03-09

Understanding Deep LearningのノートブックをJuliaで確認する。

1.1 -- Background Mathematics

Deep Learningを理解するために最低限必要な基礎的な数学を確認する。

Linear functions

Deep Learningを小さい部品に分解していくと線形関数に行き着く。線形関数の利点は計算が速いことであり、欠点は単純すぎることである。単純すぎる欠点は活性化関数にReLUを用いることで解消される。(参照：JuliaでDeep Learningを理解する: 3.1 Shallow neural networks I)

xが1次元の時、

\begin{equation}y=\beta+\omega x,\end{equation}

と表される。Juliaで定義すると

linear_function(x, β, ω) = β + ω * x

となる。図で表示すると

# 図描画パッケージCairoMakieを使う
using CairoMakie
# 0から10までを0.01刻みとする配列として動作するRange
x = 0:0.01:10
β = 0.0
ω = 1.0
lines(x, linear_function.(x, β, ω))

となる。Juliaはブロードキャストを言語としてサポートしているので、linear_function.(x, β, ω)として、関数名の後にドットを付けることで配列xの全ての要素にlinear_functionを適用している。

xが2次元になると

\begin{equation}y=\beta+\omega_1 x_1 + \omega_2 x_2.\end{equation}

となる。Juliaで定義すると

linear_function(x1, x2, β, ω) = β + ω' * [x1, x2]

とできる。関数オーバーロードは許されている。'は転置(厳密には共役転置)を取っている。つまり、ω' * [x1, x2]は内積を計算している。

x1 = 0:0.1:10
x2 = 0:0.1:10
β = 0.0
ω = [1.0, -0.5]
y = linear_function.(x1', x2, β, ω |> Ref)

yは

101×101 Matrix{Float64}:
  0.0    0.1    0.2    0.3    0.4    0.5    0.6    0.7    0.8   …  9.1   9.2   9.3   9.4   9.5   9.6   9.7   9.8   9.9   10.0
 -0.05   0.05   0.15   0.25   0.35   0.45   0.55   0.65   0.75     9.05  9.15  9.25  9.35  9.45  9.55  9.65  9.75  9.85   9.95
 -0.1    0.0    0.1    0.2    0.3    0.4    0.5    0.6    0.7      9.0   9.1   9.2   9.3   9.4   9.5   9.6   9.7   9.8    9.9
 -0.15  -0.05   0.05   0.15   0.25   0.35   0.45   0.55   0.65     8.95  9.05  9.15  9.25  9.35  9.45  9.55  9.65  9.75   9.85
  ⋮                                  ⋮                          ⋱                          ⋮                              ⋮
 -4.85  -4.75  -4.65  -4.55  -4.45  -4.35  -4.25  -4.15  -4.05     4.25  4.35  4.45  4.55  4.65  4.75  4.85  4.95  5.05   5.15
 -4.9   -4.8   -4.7   -4.6   -4.5   -4.4   -4.3   -4.2   -4.1      4.2   4.3   4.4   4.5   4.6   4.7   4.8   4.9   5.0    5.1
 -4.95  -4.85  -4.75  -4.65  -4.55  -4.45  -4.35  -4.25  -4.15     4.15  4.25  4.35  4.45  4.55  4.65  4.75  4.85  4.95   5.05
 -5.0   -4.9   -4.8   -4.7   -4.6   -4.5   -4.4   -4.3   -4.2   …  4.1   4.2   4.3   4.4   4.5   4.6   4.7   4.8   4.9    5.0

と出力される。y = linear_function.(x1', x2, β, ω |> Ref)は何をやっているかというとx1を転置することで1x101行列にして、ブロードキャストを適用している。1x101と101x1の行列のブロードキャストの出力は101x101の行列になる。ω |> Ref(Ref(ω)のシンタックスシュガー)としているのは、ωはブロードキャストの対象になってほしくないので、Refで配列の参照として渡している。

引数を色々変えて、図を表示する。

f = Figure(size=(600, 600))
ax1 = Axis(f[1, 1], title="β=0.0, ω=[1.0, -0.5]", aspect=1)
ax2 = Axis(f[1, 2], title="β=0.0, ω=[0, -0.5]", aspect=1)
ax3 = Axis(f[2, 1], title="β=0.0, ω=[1.0, 0]", aspect=1)
ax4 = Axis(f[2, 2], title="β=-5, ω=[1.0, -0.5]", aspect=1)

y = linear_function.(x1', x2, β, ω |> Ref)
contour!(ax1, x1, x2, y; labels=true, levels=-10:10)

y = linear_function.(x1', x2, β, [0, -0.5] |> Ref)
contour!(ax2, x1, x2, y; labels=true, levels=-10:10)

y = linear_function.(x1', x2, β, [1.0, 0] |> Ref)
contour!(ax3, x1, x2, y; labels=true, levels=-10:10)

y = linear_function.(x1', x2, -5, ω |> Ref)
contour!(ax4, x1, x2, y; labels=true, levels=-10:10)

次はxが3次元の場合を考える。また、同時に2つの線形関数を定義する。

\begin{align}y_1 &=& \beta_1 + \omega_{11} x_1 + \omega_{12} x_2 + \omega_{13} x_3\\
y_2 &=& \beta_2 + \omega_{21} x_1 + \omega_{22} x_2 + \omega_{23} x_3.
\end{align}

行列で書くと

\begin{equation}
\begin{bmatrix} y_1\\ y_2 \end{bmatrix} = \begin{bmatrix}\beta_{1}\\\beta_{2}\end{bmatrix}+ \begin{bmatrix}\omega_{11}&\omega_{12}&\omega_{13}\\\omega_{21}&\omega_{22}&\omega_{23}\end{bmatrix}\begin{bmatrix}x_{1}\\x_{2}\\x_{3}\end{bmatrix},
\end{equation}

\begin{equation}
\mathbf{y} = \boldsymbol\beta +\boldsymbol\Omega\mathbf{x}.
\end{equation}

と書ける。Juliaで定義すると

linear_function(x1, x2, x3, β, Ω) = β .+ Ω * [x1, x2, x3]

適当な値で計算する。

β = [0.5, 0.2]
Ω = [-1.0 0.1 -0.3; 0.4 0.1 1.2;]

y = linear_function(4, -1, 2, β, Ω)

出力は

2-element Vector{Float64}:
 -4.199999999999999
  4.1

となる。Juliaは動的型付け言語であるが言語仕様上、型情報は重要な位置を占めているため、型推論は非常に強力に作用する。上記コードで型注釈は1つも書いていないがコンパイラは全ての値の型を推論できており、正しく型付けされたLLVMを出力する。Juliaの実行速度が速いのはこれが主因である。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up