LoginSignup
5
8

More than 3 years have passed since last update.

【Python】 交互方向乗数法 (ADMM) を用いた制約付き LASSO 回帰

Last updated at Posted at 2019-10-12

制約付き LASSO 回帰の論文1中で紹介されていた手法から,交互方向乗数法 (Alternating Direction Method of Multipliers: ADMM) を用いた手法を紹介する.

扱う問題

$X\in\mathbb{R}^{n\times p}, y\in\mathbb{R}^p$ を入力データとする.次の線形制約付き LASSO 回帰問題を考える.

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta = b\ \text{and}\ G\beta\leq h
\end{align}

ADMM

線形等式制約付き最適化問題

\begin{align}
\mathop{\text{minimize}}_{\beta,z}\quad &f(\beta) + g(z) \quad\\
\text{subject to}\quad &M\beta + Fz = c
\end{align}

を扱う手法として,交互方向乗数法 (ADMM) が知られている.ADMM では,拡張ラグランジュ関数

\mathcal{L}_\tau (\beta,z,\nu) = f(\beta) + g(z) + \nu^T(M\beta + Fz - c) + \frac{\tau}{2}\|M\beta + Fz - c\|_2^2

を $\beta,z$ について交互に最小化するように変数を更新する.

\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \mathcal{L}_\tau (\beta, z^{(t)}, \nu^{(t)})\\
z^{(t+1)} &\leftarrow \argmin_z \mathcal{L}_\tau (\beta^{(t+1)}, z, \nu^{(t)})\\
\nu^{(t+1)} &\leftarrow \nu^{(t)} + \tau(M\beta^{(t+1)}+Fz^{(t+1)}-c)
\end{align}

ここで,$u=\nu/\tau$ とすると,$\mathcal{L}_\tau(\beta, z, \nu)$ は

\mathcal{L}_\tau (x,z,\nu) = f(x) + g(z) + \frac{\tau}{2}\|Mx + Fz - c + u\|_2^2 - \frac{\tau}{2}\|u\|_2^2

と書けるので,更新式を

\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \left\{f(\beta) + \frac{\tau}{2}\left\|M\beta + Fz^{(t)} - c + u^{(t)}\right\|_2^2\right\}\\
z^{(t+1)} &\leftarrow \argmin_z \left\{g(z) + \frac{\tau}{2}\left\|M\beta^{(t+1)} + Fz - c + u^{(t)}\right\|_2^2\right\}\\
u^{(t+1)} &\leftarrow u^{(t)} + M\beta^{(t+1)}+Fz^{(t+1)}-c
\end{align}

と書き換えられる.

ADMM の制約付き LASSO 回帰への適用

制約集合を

\mathcal{C} = \left\{\beta\in\mathbb{R}^p \mid A\beta=b,\ G\beta\leq h\right\}

と定義し,ADMM 内の $f,g,M,F,c$ をそれぞれ

f(\beta) = \frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1,\quad
g(z) = 
\begin{cases}
\infty & z\notin\mathcal{C}\\
0 & z\in\mathcal{C}
\end{cases},\\
M = I,\quad F = -I,\quad c = 0

とすることで,制約付き LASSO 回帰問題に対する次の更新式を得る.

\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \left\{\frac{1}{2}\|y-X\beta\|_2^2 + \frac{\tau}{2}\left\|\beta - z^{(t)} + u^{(t)}\right\|_2^2 + \rho \|\beta\|_1\right\}\\
z^{(t+1)} &\leftarrow \text{proj}_\mathcal{C}(\beta^{(t+1)} + u^{(t)})\\
u^{(t+1)} &\leftarrow u^{(t)} + \beta^{(t+1)}-z^{(t+1)}
\end{align}

β の更新

\begin{align}
& \frac{1}{2}\|y-X\beta\|_2^2 + \frac{\tau}{2}\left\|\beta - z^{(t)} + u^{(t)}\right\|_2^2 + \rho \|\beta\|_1\\
=& \frac{1}{2}\left\|\left(
\begin{matrix}
y\\ \sqrt{\tau}(z^{(t)} - u^{(t)})
\end{matrix}
\right)-\left(
\begin{matrix}
X\\ \sqrt{\tau}I_p
\end{matrix}
\right)\beta\right\|_2^2 + \rho \|\beta\|_1
\end{align}

と変形することにより,制約なし LASSO 回帰問題となる.

z の更新

\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\text{proj}_\mathcal{C}(\beta^{(t+1)} + u^{(t)}) = \argmin_{z\in\mathcal{C}}\left\{\frac{1}{2}\left\|z-(\beta^{(t+1)} + u^{(t)})\right\|_2^2\right\}

より,2 次計画問題

\begin{align}
\mathop{\text{minimize}}_z\quad &\frac{1}{2}\left\|z-(\beta^{(t+1)} + u^{(t)})\right\|_2^2 \quad\\
\text{subject to}\quad &Az= b\ \text{and}\ Gz\leq h
\end{align}

を解けば良い.

変数の初期化

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta= b\ \text{and}\ G\beta\leq h
\end{align}

を解くことで制約条件を満たす $\beta^{(0)}$ を得る.
この問題は,新たな変数 $\gamma$ を導入することで線形計画問題とみなすことができる.

\begin{align}
\mathop{\text{minimize}}_{\beta,\gamma}\quad &\sum_{i=1}^p \gamma_i \quad\\
\text{subject to}\quad &A\beta= b,\ G\beta\leq h,\\
&\ \gamma\geq0,\ \gamma\geq\beta,\ \gamma\geq-\beta
\end{align}

ソースコード

Python で実装.
$\beta$ の更新時の LASSO は sklearn を,$z$ の更新時の 2 次計画問題は cvxopt を,$\beta$ の初期化時の線形計画問題は scipy を用いた.

from cvxopt import matrix
from cvxopt.solvers import qp
import numpy as np
from scipy.optimize import linprog
from sklearn.linear_model import Lasso


class ConstrainedLasso:
    """
    Problem:
        minimize 0.5 * ||Xβ-y||^2 + ρ||β||_1
        subject to Aβ = b, Gβ ≤ h

    Algorithm:
        Alternating Direction Method of Multipliers (ADMM)
    """
    def __init__(
            self,
            A: np.ndarray = None,
            b: np.ndarray = None,
            G: np.ndarray = None,
            h: np.ndarray = None,
            rho: float = 1.0,
            tau: float = None,
            max_iter: int = 300,
            extended_output: bool = False
    ):
        """
        Parameters
        ----------
        A : np.ndarray, optional (default=None)
            The equality constraint matrix.

        b : np.ndarray, optional (default=None)
            The equality constraint vector.

        G : np.ndarray, optional (default=None)
            The inequality constraint matrix.

        h : np.ndarray, optional (default=None)
            The inequality constraint vector.

        rho : float, optional (default=1.0)
            Constant that multiplies the L1 term.

        tau : float, optional (default=None)
            Constant that used in augmented Lagrangian function.

        max_iter : int, optional (default=300)
            The maximum number of iterations.

        extended_output : bool, optional (default=False)
            If set to True, objective function value will be saved in `self.f`.
        """
        if (A is None or b is None) and (G is None or h is None):
            raise ValueError('Invalid input for __init__: (A, b) or (G, h) must not be None!')

        if A is None or b is None:
            self.A = None
            self.b = None
            self.G = matrix(G)
            self.h = matrix(h)
        elif G is None or h is None:
            self.A = matrix(A)
            self.b = matrix(b)
            self.G = None
            self.h = None
        else:
            self.A = matrix(A)
            self.b = matrix(b)
            self.G = matrix(G)
            self.h = matrix(h)

        self.rho = rho
        self.tau = tau
        self.max_iter = max_iter
        self.extended_output = extended_output

        # Lasso
        self.clf = Lasso(alpha=rho, fit_intercept=False)

        self.f = list()

    def _linear_programming(self, n_features: int) -> np.ndarray:
        """
        Solve following problem.

        Problem:
            minimize ||β||_1 subject to Aβ=b, Gβ≤h

        Solver:
            scipy.optimize.linprog

        Parameters
        ----------
        n_features : int
            The dimension of decision variables

        Returns
        ----------
        : np.ndarray, shape = (n_features, )
            The values of the decision variables that minimizes the objective function while satisfying the constraints.
        """
        # equality constraint matrix and vector
        c = np.hstack((np.zeros(n_features), np.ones(n_features)))
        A_eq = None
        b_eq = None
        if self.A is not None and self.b is not None:
            A, b = np.array(self.A), np.array(self.b).flatten()
            A_eq = np.hstack((A, np.zeros_like(A)))
            b_eq = b

        # inequality constraint matrix and vector
        eye = np.eye(n_features)
        A_ub = np.vstack((
            np.hstack((eye, -eye)),
            np.hstack((-eye, -eye)),
            np.hstack((np.zeros((n_features, n_features)), -eye))
        ))
        b_ub = np.zeros(n_features * 3)
        if self.G is not None and self.h is not None:
            G = np.array(self.G)
            h = np.array(self.h).flatten()
            A_ub = np.vstack((np.hstack((G, np.zeros_like(G))), A_ub))
            b_ub = np.hstack((h, b_ub))

        return linprog(c=c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)['x'][:n_features]

    def _projection(self, P: np.ndarray, q: np.ndarray) -> np.ndarray:
        """
        Projection into constraint set

        Problem:
            minimize ||x - q||^2 subject to Ax = b, Gx ≤ h

        Solver:
            cvxopt.solvers.qp

        Parameters
        ----------
        P : np.ndarray, shape = (n_features, n_features)
            Coefficient matrix.

        q: np.ndarray, shape = (n_features, )
            Coefficient vector.

        Returns
        ----------
        : np.ndarray, shape = (n_features, )
        """
        sol = qp(P=matrix(P), q=matrix(-q), G=self.G, h=self.h, A=self.A, b=self.b)
        return np.array(sol['x']).flatten()

    def fit(self, X: np.ndarray, y: np.ndarray) -> None:
        """
        Parameters
        ----------
        X : np.ndarray, shape = (n_samples, n_features)
            Data.

        y : np.ndarray, shape = (n_samples, )
            Target.
        """
        n_samples, n_features = X.shape

        # initialize tau if necessary
        if self.tau is None:
            self.tau = n_samples
        tau = np.sqrt(self.tau)

        # initialize constants
        n_samples_sqrt = np.sqrt(n_samples)
        Q = np.vstack((X, np.eye(n_features) * tau)) * n_samples_sqrt
        p = y * n_samples_sqrt

        P = np.eye(n_features, dtype=np.float)

        # initialize variables
        beta = self._linear_programming(n_features)
        z = np.copy(beta)
        u = np.zeros_like(beta)

        # save objective function value if necessary
        if self.extended_output:
            self.f.append(0.5 * np.linalg.norm(y - X.dot(beta)) ** 2 + np.sum(np.abs(beta)) * self.rho)

        # main loop
        for _ in range(self.max_iter):
            w = np.hstack((p, (z - u) * tau * n_samples_sqrt))
            self.clf.fit(Q, w)
            beta = self.clf.coef_
            z = self._projection(P=P, q=beta+u)
            u = u + beta - z

            # save objective function value if necessary
            if self.extended_output:
                self.f.append(0.5 * np.linalg.norm(y - X.dot(beta)) ** 2 + np.sum(np.abs(beta)) * self.rho)

        # save result
        self.coef_ = z
5
8
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
8