【初心者向け】勾配降下法とは何か？Pythonだけで徹底解析＆数学的に解説！

Posted at 2025-03-23

【初心者向け】勾配降下法とは何か？Pythonで徹底解析＆数学的に解説！

✅ 本記事の目的

この記事では、以下のことを学びます：

勾配降下法（Gradient Descent）の仕組みと数式の意味
Pythonだけで勾配降下法を実装・可視化
「なぜ最適解に近づけるのか？」を数学的に理解
実際の学習プロセスを数式＋コードで“腹落ち”させる

1. 勾配降下法とは？

● 一言でいうと？

関数の最小値を見つけるための反復アルゴリズムです。
「傾きを見て、谷の方へちょっとずつ進む」やつです。

2. 数式で理解する勾配降下法

● 問題設定：

関数 $f(x)$ の最小値を求めたい。
ここでは簡単な例として：

$$
f(x) = (x - 3)^2
$$

この関数のグラフは、$x=3$の位置に谷がある放物線です。

● 更新式（基本形）：

$$
x_{t+1} = x_t - \eta \cdot f'(x_t)
$$

$x_t$：現在の位置
$\eta$：学習率（Learning rate）
$f'(x_t)$：接線の傾き（導関数）

3. Pythonで勾配降下法を実装しよう！

ステップ①：目的関数とその導関数の定義

def f(x):
    return (x - 3)**2

def df(x):
    return 2 * (x - 3)

ステップ②：アルゴリズムの実装

def gradient_descent(start_x, learning_rate, steps):
    x = start_x
    history = [x]
    for _ in range(steps):
        grad = df(x)
        x = x - learning_rate * grad
        history.append(x)
    return history

ステップ③：試してみる

history = gradient_descent(start_x=0.0, learning_rate=0.1, steps=25)
for i, x in enumerate(history):
    print(f"Step {i:02d}: x = {x:.6f}, f(x) = {f(x):.6f}")

4. 数学的に見てみよう（Taylor展開の視点）

勾配降下法の本質は「関数を局所的に直線で近似する」ことにあります。

1変数関数 $f(x)$ を1次のTaylor展開で近似すると：

$$
f(x + \Delta x) \approx f(x) + f'(x)\Delta x
$$

ここで、関数を小さくするには $\Delta x$ を負の傾き方向に取りたい：

$$
\Delta x = -\eta f'(x)
$$

→ これが勾配降下法の更新式の由来です。

5. 学習率（η）の影響をPythonで比較

import numpy as np
import matplotlib.pyplot as plt

def run_and_plot(lr_list, steps=20):
    x_vals = np.linspace(-1, 7, 100)
    y_vals = f(x_vals)

    plt.figure(figsize=(12, 3))
    for i, lr in enumerate(lr_list):
        hist = gradient_descent(0.0, lr, steps)
        fx_hist = [f(x) for x in hist]
        plt.subplot(1, len(lr_list), i+1)
        plt.plot(x_vals, y_vals, label="f(x)")
        plt.scatter(hist, fx_hist, c='red')
        plt.title(f"Learning rate = {lr}")
        plt.xlabel("x")
        plt.ylabel("f(x)")
        plt.grid(True)
    plt.tight_layout()
    plt.show()

run_and_plot([0.01, 0.1, 0.5])

結果から分かること：

小さい学習率 → 安定するが、収束が遅い
大きい学習率 → 飛び跳ねたり、収束しなかったりする

6. 収束の数式的条件とは？

勾配降下法が収束するには、学習率 $\eta$ が適切な範囲にある必要があります。

例：1次関数 $f(x) = ax^2$ の場合、理論的には：

$$
0 < \eta < \frac{1}{L}
$$

ここで $L$ は関数の2階微分（曲率）＝Lipschitz定数。

7. おわりに：勾配降下法まとめ

項目	解説
何に使う？	損失関数の最小化（機械学習モデルの学習）
基本原理	微分（傾き）を見て、下り坂方向に更新
数式	$x_{t+1} = x_t - \eta f'(x_t)$
重要パラメータ	学習率（大きすぎると発散、小さすぎると遅い）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up