制約付き LASSO 回帰の論文1中で紹介されていた手法から,交互方向乗数法 (Alternating Direction Method of Multipliers: ADMM) を用いた手法を紹介する.
扱う問題
$X\in\mathbb{R}^{n\times p}, y\in\mathbb{R}^p$ を入力データとする.次の線形制約付き LASSO 回帰問題を考える.
\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta = b\ \text{and}\ G\beta\leq h
\end{align}
ADMM
線形等式制約付き最適化問題
\begin{align}
\mathop{\text{minimize}}_{\beta,z}\quad &f(\beta) + g(z) \quad\\
\text{subject to}\quad &M\beta + Fz = c
\end{align}
を扱う手法として,交互方向乗数法 (ADMM) が知られている.ADMM では,拡張ラグランジュ関数
\mathcal{L}_\tau (\beta,z,\nu) = f(\beta) + g(z) + \nu^T(M\beta + Fz - c) + \frac{\tau}{2}\|M\beta + Fz - c\|_2^2
を $\beta,z$ について交互に最小化するように変数を更新する.
\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \mathcal{L}_\tau (\beta, z^{(t)}, \nu^{(t)})\\
z^{(t+1)} &\leftarrow \argmin_z \mathcal{L}_\tau (\beta^{(t+1)}, z, \nu^{(t)})\\
\nu^{(t+1)} &\leftarrow \nu^{(t)} + \tau(M\beta^{(t+1)}+Fz^{(t+1)}-c)
\end{align}
ここで,$u=\nu/\tau$ とすると,$\mathcal{L}_\tau(\beta, z, \nu)$ は
\mathcal{L}_\tau (x,z,\nu) = f(x) + g(z) + \frac{\tau}{2}\|Mx + Fz - c + u\|_2^2 - \frac{\tau}{2}\|u\|_2^2
と書けるので,更新式を
\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \left\{f(\beta) + \frac{\tau}{2}\left\|M\beta + Fz^{(t)} - c + u^{(t)}\right\|_2^2\right\}\\
z^{(t+1)} &\leftarrow \argmin_z \left\{g(z) + \frac{\tau}{2}\left\|M\beta^{(t+1)} + Fz - c + u^{(t)}\right\|_2^2\right\}\\
u^{(t+1)} &\leftarrow u^{(t)} + M\beta^{(t+1)}+Fz^{(t+1)}-c
\end{align}
と書き換えられる.
ADMM の制約付き LASSO 回帰への適用
制約集合を
\mathcal{C} = \left\{\beta\in\mathbb{R}^p \mid A\beta=b,\ G\beta\leq h\right\}
と定義し,ADMM 内の $f,g,M,F,c$ をそれぞれ
f(\beta) = \frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1,\quad
g(z) =
\begin{cases}
\infty & z\notin\mathcal{C}\\
0 & z\in\mathcal{C}
\end{cases},\\
M = I,\quad F = -I,\quad c = 0
とすることで,制約付き LASSO 回帰問題に対する次の更新式を得る.
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \left\{\frac{1}{2}\|y-X\beta\|_2^2 + \frac{\tau}{2}\left\|\beta - z^{(t)} + u^{(t)}\right\|_2^2 + \rho \|\beta\|_1\right\}\\
z^{(t+1)} &\leftarrow \text{proj}_\mathcal{C}(\beta^{(t+1)} + u^{(t)})\\
u^{(t+1)} &\leftarrow u^{(t)} + \beta^{(t+1)}-z^{(t+1)}
\end{align}
β の更新
\begin{align}
& \frac{1}{2}\|y-X\beta\|_2^2 + \frac{\tau}{2}\left\|\beta - z^{(t)} + u^{(t)}\right\|_2^2 + \rho \|\beta\|_1\\
=& \frac{1}{2}\left\|\left(
\begin{matrix}
y\\ \sqrt{\tau}(z^{(t)} - u^{(t)})
\end{matrix}
\right)-\left(
\begin{matrix}
X\\ \sqrt{\tau}I_p
\end{matrix}
\right)\beta\right\|_2^2 + \rho \|\beta\|_1
\end{align}
と変形することにより,制約なし LASSO 回帰問題となる.
z の更新
\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\text{proj}_\mathcal{C}(\beta^{(t+1)} + u^{(t)}) = \argmin_{z\in\mathcal{C}}\left\{\frac{1}{2}\left\|z-(\beta^{(t+1)} + u^{(t)})\right\|_2^2\right\}
より,2 次計画問題
\begin{align}
\mathop{\text{minimize}}_z\quad &\frac{1}{2}\left\|z-(\beta^{(t+1)} + u^{(t)})\right\|_2^2 \quad\\
\text{subject to}\quad &Az= b\ \text{and}\ Gz\leq h
\end{align}
を解けば良い.
変数の初期化
\begin{align}
\mathop{\text{minimize}}_\beta\quad &\|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta= b\ \text{and}\ G\beta\leq h
\end{align}
を解くことで制約条件を満たす $\beta^{(0)}$ を得る.
この問題は,新たな変数 $\gamma$ を導入することで線形計画問題とみなすことができる.
\begin{align}
\mathop{\text{minimize}}_{\beta,\gamma}\quad &\sum_{i=1}^p \gamma_i \quad\\
\text{subject to}\quad &A\beta= b,\ G\beta\leq h,\\
&\ \gamma\geq0,\ \gamma\geq\beta,\ \gamma\geq-\beta
\end{align}
ソースコード
Python で実装.
$\beta$ の更新時の LASSO は sklearn を,$z$ の更新時の 2 次計画問題は cvxopt を,$\beta$ の初期化時の線形計画問題は scipy を用いた.
from cvxopt import matrix
from cvxopt.solvers import qp
import numpy as np
from scipy.optimize import linprog
from sklearn.linear_model import Lasso
class ConstrainedLasso:
"""
Problem:
minimize 0.5 * ||Xβ-y||^2 + ρ||β||_1
subject to Aβ = b, Gβ ≤ h
Algorithm:
Alternating Direction Method of Multipliers (ADMM)
"""
def __init__(
self,
A: np.ndarray = None,
b: np.ndarray = None,
G: np.ndarray = None,
h: np.ndarray = None,
rho: float = 1.0,
tau: float = None,
max_iter: int = 300,
extended_output: bool = False
):
"""
Parameters
----------
A : np.ndarray, optional (default=None)
The equality constraint matrix.
b : np.ndarray, optional (default=None)
The equality constraint vector.
G : np.ndarray, optional (default=None)
The inequality constraint matrix.
h : np.ndarray, optional (default=None)
The inequality constraint vector.
rho : float, optional (default=1.0)
Constant that multiplies the L1 term.
tau : float, optional (default=None)
Constant that used in augmented Lagrangian function.
max_iter : int, optional (default=300)
The maximum number of iterations.
extended_output : bool, optional (default=False)
If set to True, objective function value will be saved in `self.f`.
"""
if (A is None or b is None) and (G is None or h is None):
raise ValueError('Invalid input for __init__: (A, b) or (G, h) must not be None!')
if A is None or b is None:
self.A = None
self.b = None
self.G = matrix(G)
self.h = matrix(h)
elif G is None or h is None:
self.A = matrix(A)
self.b = matrix(b)
self.G = None
self.h = None
else:
self.A = matrix(A)
self.b = matrix(b)
self.G = matrix(G)
self.h = matrix(h)
self.rho = rho
self.tau = tau
self.max_iter = max_iter
self.extended_output = extended_output
# Lasso
self.clf = Lasso(alpha=rho, fit_intercept=False)
self.f = list()
def _linear_programming(self, n_features: int) -> np.ndarray:
"""
Solve following problem.
Problem:
minimize ||β||_1 subject to Aβ=b, Gβ≤h
Solver:
scipy.optimize.linprog
Parameters
----------
n_features : int
The dimension of decision variables
Returns
----------
: np.ndarray, shape = (n_features, )
The values of the decision variables that minimizes the objective function while satisfying the constraints.
"""
# equality constraint matrix and vector
c = np.hstack((np.zeros(n_features), np.ones(n_features)))
A_eq = None
b_eq = None
if self.A is not None and self.b is not None:
A, b = np.array(self.A), np.array(self.b).flatten()
A_eq = np.hstack((A, np.zeros_like(A)))
b_eq = b
# inequality constraint matrix and vector
eye = np.eye(n_features)
A_ub = np.vstack((
np.hstack((eye, -eye)),
np.hstack((-eye, -eye)),
np.hstack((np.zeros((n_features, n_features)), -eye))
))
b_ub = np.zeros(n_features * 3)
if self.G is not None and self.h is not None:
G = np.array(self.G)
h = np.array(self.h).flatten()
A_ub = np.vstack((np.hstack((G, np.zeros_like(G))), A_ub))
b_ub = np.hstack((h, b_ub))
return linprog(c=c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)['x'][:n_features]
def _projection(self, P: np.ndarray, q: np.ndarray) -> np.ndarray:
"""
Projection into constraint set
Problem:
minimize ||x - q||^2 subject to Ax = b, Gx ≤ h
Solver:
cvxopt.solvers.qp
Parameters
----------
P : np.ndarray, shape = (n_features, n_features)
Coefficient matrix.
q: np.ndarray, shape = (n_features, )
Coefficient vector.
Returns
----------
: np.ndarray, shape = (n_features, )
"""
sol = qp(P=matrix(P), q=matrix(-q), G=self.G, h=self.h, A=self.A, b=self.b)
return np.array(sol['x']).flatten()
def fit(self, X: np.ndarray, y: np.ndarray) -> None:
"""
Parameters
----------
X : np.ndarray, shape = (n_samples, n_features)
Data.
y : np.ndarray, shape = (n_samples, )
Target.
"""
n_samples, n_features = X.shape
# initialize tau if necessary
if self.tau is None:
self.tau = n_samples
tau = np.sqrt(self.tau)
# initialize constants
n_samples_sqrt = np.sqrt(n_samples)
Q = np.vstack((X, np.eye(n_features) * tau)) * n_samples_sqrt
p = y * n_samples_sqrt
P = np.eye(n_features, dtype=np.float)
# initialize variables
beta = self._linear_programming(n_features)
z = np.copy(beta)
u = np.zeros_like(beta)
# save objective function value if necessary
if self.extended_output:
self.f.append(0.5 * np.linalg.norm(y - X.dot(beta)) ** 2 + np.sum(np.abs(beta)) * self.rho)
# main loop
for _ in range(self.max_iter):
w = np.hstack((p, (z - u) * tau * n_samples_sqrt))
self.clf.fit(Q, w)
beta = self.clf.coef_
z = self._projection(P=P, q=beta+u)
u = u + beta - z
# save objective function value if necessary
if self.extended_output:
self.f.append(0.5 * np.linalg.norm(y - X.dot(beta)) ** 2 + np.sum(np.abs(beta)) * self.rho)
# save result
self.coef_ = z