More than 3 years have passed since last update.

【Python】リーマン多様体上の最急降下法

Python

Last updated at 2021-04-18Posted at 2019-09-27

リーマン多様体上での最適化について，最も単純な最急降下法についてまとめる．

【参考】リーマン多様体上の共役勾配法については以下の記事にまとめた．

【Python】リーマン多様体上の共役勾配法 - Qiita

数空間上，リーマン多様体上の最急降下法のアルゴリズム¹

次の最適化問題について，$X$ がそれぞれ数空間 ($X=\mathbb{R}^n$)，リーマン多様体 ($X=\mathcal{M}$) であるときの最急降下法を紹介する．

\begin{align}
\text{minimize}\quad &f(x)\\
\text{subject to}\quad &x\in X
\end{align}

数空間上での最急降下法

初期化: $x_0\in \mathbb{R}^n,\ k\leftarrow 0$

停止条件が満たされるなら，$x_k$ を出力
探索方向 $d_k\leftarrow -\nabla f(x_k)$
直線探索によるステップ幅 $t_k\geq0$ を計算
$x_{k+1}\leftarrow x_k + t_k d_k$ で更新
$k \leftarrow k+1$ として，ステップ 1 へ

リーマン多様体上での最急降下法

初期化: $x_0\in \mathcal{M},\ k\leftarrow 0$

停止条件が満たされるなら，$x_k$ を出力
探索方向 $d_k\leftarrow -\color{red}{\mathop{\mathrm{grad}}}f(x_k)\in T_{x_k}\mathcal{M}$
直線探索によるステップ幅 $t_k\geq0$ を計算
$x_{k+1}\leftarrow \color{red}{R_{x_k}}(t_k d_k)$ で更新
$k \leftarrow k+1$ として，ステップ 1 へ

数空間上，リーマン多様体上の最急降下法の違い¹²

一般に多様体 $\mathcal{M}$ はベクトル空間とは限らないため，線形和が計算できる保証がない，そこで，$\nabla f(x)$ の代わりに $\mathrm{grad},f(x)$，$x_k + t_k d_k$ の代わりに $R_{x_k}(t_k d_k)$ が用いられている．ここでは，これらの用語を説明する．

接空間 (tangent space)

$x\in\mathcal{M}$ における接空間 $T_x\mathcal{M}$ とは，$\mathcal{M}$ 上の点 $x$ における接ベクトル全体の集合のことである．

接ベクトル (tangent vector)

$x\in\mathcal{M}$ における接ベクトル $\xi_x$ とは，$\mathcal{M}$ 上の関数から $\mathbb{R}$ への写像であり，任意の $\mathcal{M}$ 上の関数 $f:\ \mathcal{M}\to\mathbb{R}$ に対して，$x\in\mathcal{M}$ を通る曲線 $\gamma:\ \mathbb{R}\to\mathcal{M}\ (\gamma(0)=x)$ が存在し，

\xi_x f = \dot{\gamma}(0)f := \left.\frac{\mathrm{d}f(\gamma(t))}{\mathrm{d}t}\right|_{t=0}

を満たすものである．

リーマン計量 (Riemannian metric)

リーマン計量とは，$x\in\mathcal{M}$ から接空間 $T_x{\mathcal{M}}$ 上の内積 $g_x(\cdot,\ \cdot)$ への対応関係のことである．

勾配 (gradient)

$x\in\mathcal{M}$ における関数 $f$ の勾配 $\mathrm{grad},(x)$ とは，一意に定まる接空間 $T_x\mathcal{M}$ の元で

g_x(\mathrm{grad}\,f(x),\ \xi) = \mathrm{D}\,f(x)[\xi],\quad\forall\xi\in T_x\mathcal{M}

を満たすものである．ここで，$\mathrm{D},f(x)[\xi]$ は $f$ の点 $x$ における $\xi$ 方向微分である．
※ ユークリッド空間では
$$\mathrm{D},f(x)[\xi] = \lim_{t\to0}\frac{f(x+t\xi) - f(x)}{t} = \nabla f(x)^\top\xi$$

レトラクション (retraction)

レトラクション $R$ とは，$x\in\mathcal{M}$ と $\xi\in T_x\mathcal{M}$ に対して $R_x(\xi)\in\mathcal{M}$ を対応させる関数で，次の 2 条件を満たすものである．ここで，$0_x\in T_x\mathcal M$ は零ベクトル，$\mathrm{id}_{T_x \mathcal M}$ は接空間 $T_x\mathcal M$ 上の恒等写像である．

$R_x(0_x) = x$
$\mathrm{D},R_x(0_x)=\mathrm{id}_{T_x\mathcal M}$

※ ユークリッド空間では，$R_x(\xi) = x + \xi$ とおくと，$R$ は次のようにレトラクションの定義をみたす．

$R_x(0_x) = x$
任意の $\xi\in\mathbb{R}^n$ に対して，$\mathrm{D},R_x(0_x)[\xi] = \lim_{t\to0}\frac{R_x(0_x + t\xi) - R_x(0_x)}{t} = \xi$

数値例: レイリー商¹

レイリー商

R(x) = \frac{x^T A x}{x^T x}

を最小化する問題

\begin{align}
\text{minimize}\quad &R(x)\\
\text{subject to}\quad &x\in X
\end{align}

を最急降下法を用いて計算するプログラムを $X=\mathbb{R}^n$, $X=S^{n-1}$ の 2 通りに対して実装した．ここで，

S^{n-1}:=\{x\in\mathbb{R}^n\mid \|x\|_2=1\}

は $n-1$ 次元球面を表す．

レイリー商 $R(x)$ には，

R(\alpha x) = R(x)

という性質があり，最適解に実数倍の自由度がある．そこで変数 $x$ を $S^{n-1}$ 上に制限することで，最適化問題は

\begin{align}
\text{minimize}\quad &x^T A x\\
\text{subject to}\quad &x \in S^{n-1}
\end{align}

となり，最適解の自由度を取り除くことができる．

球上の接ベクトル空間，内積，勾配，レトラクション

接ベクトル空間

T_x S^{n-1} = \{\xi \in \mathbb{R}^n\mid x^T \xi = 0\}

内積

g_x(\xi, \eta) = \xi^T \eta

勾配

\text{grad}\,f(x) = \nabla f(x) - x \nabla f(x)^T x = (I-xx^T)\nabla f(x)

レトラクション

R_x(\xi) = \frac{x+\xi}{\|x+\xi\|}

ソースコード

ソースコードや実行した notebook は Github に．

from abc import ABCMeta, abstractmethod

import numpy as np


class Manifold(metaclass=ABCMeta):
    @abstractmethod
    def retraction(self, x: np.ndarray, v: np.ndarray):
        raise NotImplementedError('The function retraction is not implemented')


class RealSpace(Manifold):
    def retraction(self, x: np.ndarray, v: np.ndarray) -> np.ndarray:
        return x + v

    def vector_transport(self, x: np.ndarray, v: np.ndarray, a: np.ndarray) -> np.ndarray:
        return a


class Sphere(Manifold):
    def retraction(self, x: np.ndarray, v: np.ndarray) -> np.ndarray:
        """
        Compute R_x(v)
        """
        ret = x + v
        norm = np.linalg.norm(ret)
        return ret / norm if norm != 0 else ret

    def vector_transport(self, x: np.ndarray, v: np.ndarray, a: np.ndarray) -> np.ndarray:
        w = x + v
        return a - np.dot(v, a) / np.dot(w, w) * w


class GradientDescent(metaclass=ABCMeta):
    def __init__(
            self,
            manifold: str = 'sphere',
            initial_step: float = 1.0,
            armijo_param: float = 0.5,
            max_iter: int = 300,
            extended_output: bool = False
        ):
        self.initial_step = initial_step
        self.armijo_param = armijo_param
        self.max_iter = max_iter
        self.extended_output = extended_output

        if manifold == 'sphere':
            self.manifold = Sphere()
        elif manifold == 'real':
            self.manifold = RealSpace()
        else:
            print(f'Manifold {manifold} is not implemented! Use real space instead.')
            self.manifold = RealSpace()

        self.f = list()

    @abstractmethod
    def _f(self, x: np.ndarray):
        raise NotImplementedError('The function _f is not implemented')

    @abstractmethod
    def _df(self, x: np.ndarray):
        raise NotImplementedError('The function _df is not implemented')

    def _step_size(self, x: np.ndarray, d: np.ndarray) -> float:
        """
        Armijo condition
        """
        df = self._df(x)
        g = np.dot(df, d)
        t = self.initial_step
        f = self._f(x)
        while self._f(self.manifold.retraction(x, t * d)) > f + self.armijo_param * t * g:
            t *= 0.5
            # break when step size 't' becomes too small
            if t <= 1e-16:
                t = 0
                break
        return t

    def optimize(self, x: np.ndarray):
        # initialize
        res = np.copy(x)

        # main loop
        for _ in range(self.max_iter):
            d = - self._df(res)
            step_size = self._step_size(res, d)
            # break when step_size == 0
            if step_size == 0:
                break

            # update
            res = self.manifold.retraction(res, step_size * d)

            # store objective function value if necessary
            if self.extended_output:
                self.f.append(self._f(res))
        return res


class RayleighQuotientGD(GradientDescent):
    def __init__(
            self,
            A: np.ndarray,
            initial_step: float = 1.0,
            armijo_param: float = 0.5,
            max_iter: int = 300,
            extended_output: bool = False
    ):
        super().__init__(
            manifold='real',
            initial_step=initial_step,
            armijo_param=armijo_param,
            max_iter=max_iter,
            extended_output=extended_output
        )
        self.A = A

    def _f(self, x: np.ndarray) -> float:
        Ax = np.dot(self.A, x)
        xx = np.dot(x, x)
        return np.dot(Ax, x) / xx

    def _df(self, x: np.ndarray) -> np.ndarray:
        Ax = np.dot(self.A, x)
        xx = np.dot(x, x)
        f = np.dot(Ax, x) / xx
        return 2 * (Ax - f * x) / xx


class RayleighQuotientSphereGD(GradientDescent):
    def __init__(
            self,
            A: np.ndarray,
            initial_step: float = 1.0,
            armijo_param: float = 0.5,
            max_iter: int = 300,
            extended_output: bool = False
    ):
        super().__init__(
            manifold='sphere',
            initial_step=initial_step,
            armijo_param=armijo_param,
            max_iter=max_iter,
            extended_output=extended_output
        )
        self.A = A

    def _f(self, x: np.ndarray) -> float:
        Ax = np.dot(self.A, x)
        return np.dot(Ax, x)

    def _df(self, x: np.ndarray) -> np.ndarray:
        Ax = np.dot(self.A, x)
        f = np.dot(Ax, x)
        return 2 * (Ax - f * x)

金森敬文，鈴木大慈，竹内一郎，佐藤一誠．機械学習のための連続最適化．講談社．(2016) ↩ ↩² ↩³
P. -A. Absil, R. Mahony, R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press. (2008) ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

【Python】 リーマン多様体上の最急降下法

数空間上，リーマン多様体上の最急降下法のアルゴリズム1