0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

高校数学から学ぶ活性化関数とニューラルネットワーク

Posted at

1. 高校数学の関数の基本

1.1 一次関数(Linear function)

  • 式:
    $y = ax + b$

  • 特徴:

    • グラフは直線
    • $a$:傾き(slope)
    • $b$:切片(intercept)
  • AIでの利用例:単純な重み付け和の計算に相当

1.2 放物線(二次関数, Quadratic function)

  • 式:
    $y = ax^2 + bx + c$

  • 特徴:

    • グラフはU字型($a > 0$)または逆U字型($a < 0$)
    • 頂点:$\left( -\frac{b}{2a}, f\left(-\frac{b}{2a}\right) \right)$
  • AIでの利用例:二乗誤差(MSE)や曲線近似

1.3 ルート関数(Square root function)

  • 式:
    $y = \sqrt{x}$

  • 特徴:

    • 定義域:$x \geq 0$
    • 成長が次第に緩やかになる非線形性
  • AIでの利用例:Bent Identity関数の一部形状に類似

1.4 指数関数(Exponential function)

  • 式:
    $y = a^x$

  • 特徴:

    • 急増または急減する
    • 常用対数や自然対数の逆関数
  • AIでの利用例:Sigmoid関数やSoftmaxの計算に利用

1.5 対数関数(Logarithmic function)

  • 式:
    $y = \log_a x$

  • 特徴:

    • 定義域:$x > 0$
    • 増加は緩やか
  • AIでの利用例:Softplus関数に利用

1.6 三角関数(Trigonometric functions)

  • 式:
    $y = \sin x, \cos x$

  • 特徴:

    • 周期性
    • 振動・波動を表現
  • AIでの利用例:SIREN(Sinusoidal Representation Networks)


2. ニューラルネットワークの概要

  • 順伝播(Forward propagation):入力データが各層を通過し、出力を得る計算過程
  • 逆伝播(Backpropagation):出力誤差を基に勾配を計算し、重みを更新する過程
  • 活性化関数(Activation function):層の出力に非線形性を与え、学習の表現力を向上

3. 代表的な活性化関数と高校数学対応

高校関数 活性化関数
一次関数 ReLU(Rectified Linear Unit) $f(x) = \max(0, x)$
放物線 HardSwish $f(x) = x \cdot \frac{\text{ReLU6}(x+3)}{6}$
ルート Bent Identity $f(x) = \frac{\sqrt{x^2+1} - 1}{2} + x$
指数 Sigmoid $f(x) = \frac{1}{1 + e^{-x}}$
指数 Softmax $\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$
対数 Softplus $f(x) = \log(1 + e^x)$
三角 Snake $f(x) = x + \frac{1}{\alpha} \sin^2(\alpha x)$
三角 SIREN $f(x) = \sin(\omega_0 x)$

4. 順伝播と逆伝播の数式例(1層+ReLU)

順伝播(Forward Propagation)

  1. 入力 $x \in \mathbb{R}^n$、重み $W \in \mathbb{R}^{m \times n}$、バイアス $b \in \mathbb{R}^m$

  2. 線形変換:

    $$
    z = W x + b
    $$

  3. 活性化(ReLU):

    $$
    a = \max(0, z)
    $$

  4. 出力 $a$ を次の層に渡す

逆伝播(Backpropagation)

  • 損失関数 $L$ に対する勾配計算:

    1. 活性化の勾配(ReLU):

      $$
      \frac{\partial L}{\partial z_i} =
      \begin{cases}
      \frac{\partial L}{\partial a_i} & z_i > 0 \
      0 & z_i \le 0
      \end{cases}
      $$

    2. 重みの勾配:

      $$
      \frac{\partial L}{\partial W} = \frac{\partial L}{\partial z} \cdot x^T
      $$

    3. バイアスの勾配:

      $$
      \frac{\partial L}{\partial b} = \frac{\partial L}{\partial z}
      $$

    4. 入力側への勾配:

      $$
      \frac{\partial L}{\partial x} = W^T \cdot \frac{\partial L}{\partial z}
      $$


# -*- coding: utf-8 -*-
# Activation plots + "backprop" (derivative) plots / 活性化関数とその導関数を可視化
# 依存 / deps
import numpy as np
import matplotlib.pyplot as plt

# ----------------------------
# 1) Scalar activations & derivatives / スカラー活性化と導関数
# ----------------------------
def relu(x):
    return np.maximum(0.0, x)
def drelu(x):
    return (x > 0).astype(float)

def hswish(x):
    return x * np.clip(x + 3.0, 0.0, 6.0) / 6.0
def dhswish(x):
    y = np.zeros_like(x, dtype=float)
    y[x <= -3.0] = 0.0
    y[x >= 3.0]  = 1.0
    m = (x > -3.0) & (x < 3.0)
    y[m] = (2.0*x[m] + 3.0) / 6.0
    return y

def bent_identity(x):
    return x + (np.sqrt(x**2 + 1.0) - 1.0) / 2.0
def dbent_identity(x):
    return 1.0 + x / (2.0*np.sqrt(x**2 + 1.0))

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))
def dsigmoid(x):
    s = sigmoid(x)
    return s*(1.0 - s)

def softplus(x):
    # stable: max(0,x)+log1p(exp(-|x|))
    return np.maximum(0.0, x) + np.log1p(np.exp(-np.abs(x)))
def dsoftplus(x):
    return sigmoid(x)  # d/dx softplus = sigmoid

def snake(x, alpha=1.0):
    return x + (1.0/alpha)*np.sin(alpha*x)**2
def dsnake(x, alpha=1.0):
    return 1.0 + np.sin(2.0*alpha*x)  # 1 + 2 sin cos

def siren(x, w0=1.0):
    return np.sin(w0*x)
def dsiren(x, w0=1.0):
    return w0*np.cos(w0*x)

# ----------------------------
# 2) Softmax (vector) & Jacobian diag plot / ソフトマックスとヤコビアン対角の可視化
# ----------------------------
def softmax(z, axis=-1):
    z = z - np.max(z, axis=axis, keepdims=True)  # stable
    e = np.exp(z)
    return e / np.sum(e, axis=axis, keepdims=True)

def softmax_diag_derivative(z):
    """
    Return diag(J) where J_ij = ∂s_i/∂z_j.
    s_i(1-s_i) on diagonal. / 対角成分のみ
    """
    s = softmax(z, axis=-1)
    return s*(1.0 - s)

# ----------------------------
# 3) Plot settings / 描画設定
# ----------------------------
xs = np.linspace(-6.0, 6.0, 800)

# --- ReLU ---
plt.figure()
plt.plot(xs, relu(xs), label='ReLU f(x)')
plt.plot(xs, drelu(xs), label="ReLU' (x)")
plt.title('ReLU: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# --- HardSwish ---
plt.figure()
plt.plot(xs, hswish(xs), label='HardSwish f(x)')
plt.plot(xs, dhswish(xs), label="HardSwish' (x)")
plt.title('HardSwish: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# --- Bent Identity ---
plt.figure()
plt.plot(xs, bent_identity(xs), label='BentIdentity f(x)')
plt.plot(xs, dbent_identity(xs), label="BentIdentity\' (x)")
plt.title('Bent Identity: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# --- Sigmoid & Softplus ---
plt.figure()
plt.plot(xs, sigmoid(xs), label='Sigmoid f(x)')
plt.plot(xs, dsigmoid(xs), label="Sigmoid' (x)")
plt.title('Sigmoid: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

plt.figure()
plt.plot(xs, softplus(xs), label='Softplus f(x)')
plt.plot(xs, dsoftplus(xs), label="Softplus' (x) = Sigmoid")
plt.title('Softplus: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# --- Snake (alpha=1) ---
alpha = 1.0
plt.figure()
plt.plot(xs, snake(xs, alpha), label=f'Snake f(x), α={alpha}')
plt.plot(xs, dsnake(xs, alpha), label=f"Snake' (x) = 1 + sin(2αx)")
plt.title('Snake: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# --- SIREN (w0=2) ---
w0 = 2.0
plt.figure()
plt.plot(xs, siren(xs, w0), label=f'SIREN sin(w0 x), w0={w0}')
plt.plot(xs, dsiren(xs, w0), label=f"SIREN' = w0 cos(w0 x)")
plt.title('SIREN: activation & derivative')
plt.xlabel('x'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# --- Softmax path: z=[t,0,0] ---
ts = np.linspace(-6.0, 6.0, 400)
Z = np.stack([ts, np.zeros_like(ts), np.zeros_like(ts)], axis=1)  # (T,3)
S = softmax(Z, axis=1)     # (T,3)
Ddiag = softmax_diag_derivative(Z)  # (T,3)

plt.figure()
plt.plot(ts, S[:,0], label='softmax_1(t,0,0)')
plt.plot(ts, S[:,1], label='softmax_2(t,0,0)')
plt.plot(ts, S[:,2], label='softmax_3(t,0,0)')
plt.title('Softmax components along z=[t,0,0]')
plt.xlabel('t'); plt.ylabel('probability'); plt.legend(); plt.grid(True); plt.tight_layout()

plt.figure()
plt.plot(ts, Ddiag[:,0], label='∂s1/∂z1 = s1(1-s1)')
plt.plot(ts, Ddiag[:,1], label='∂s2/∂z2 = s2(1-s2)')
plt.plot(ts, Ddiag[:,2], label='∂s3/∂z3 = s3(1-s3)')
plt.title('Softmax diagonal derivatives (Jacobian diag)')
plt.xlabel('t'); plt.ylabel('value'); plt.legend(); plt.grid(True); plt.tight_layout()

# ----------------------------
# 4) Tiny forward-backward demo (1-layer + ReLU) / 順・逆伝播デモ
# ----------------------------
def forward_relu_layer(x, W, b):
    z = x @ W + b
    a = relu(z)
    cache = (x, W, b, z, a)
    return a, cache

def mse_loss(a, y):
    return 0.5*np.mean((a - y)**2)

def backward_relu_layer(dL_da, cache):
    x, W, b, z, a = cache
    dL_dz = dL_da * drelu(z)         # ⊙ ReLU'(z)
    dW = x.T @ dL_dz
    db = dL_dz.sum(axis=0)
    dx = dL_dz @ W.T
    return dW, db, dx, dL_dz

# data / デモ用データ
np.random.seed(42)
N, D, M = 64, 3, 2
x = np.random.randn(N, D)
true_W = np.array([[1.0, -0.5],[0.3, 0.8],[-0.2, 0.1]])
true_b = np.array([0.2, -0.1])
y = relu(x @ true_W + true_b) + 0.05*np.random.randn(N, M)  # 教師(ノイズ付き)

# init params / パラメータ初期化
W = 0.5*np.random.randn(D, M)
b = np.zeros(M)
lr = 0.1

loss_hist = []
for it in range(50):
    a, cache = forward_relu_layer(x, W, b)
    loss = mse_loss(a, y)
    loss_hist.append(loss)
    dL_da = (a - y) / N
    dW, db, dx, dL_dz = backward_relu_layer(dL_da, cache)
    W -= lr*dW
    b -= lr*db

# plot loss / 損失の推移
plt.figure()
plt.plot(loss_hist, label='MSE loss')
plt.title('Training curve (1-layer ReLU)')
plt.xlabel('iteration'); plt.ylabel('loss'); plt.grid(True); plt.legend(); plt.tight_layout()

plt.show()
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?