0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

機械学習✖️最適化 ー Smart Predict then Optimize

Last updated at Posted at 2025-05-29

Traditional PO and SPO

Loss function in traditional prediction for optimization (PO)

Also called Two-stage Method1

\ell_{\mathrm{pred}}(\hat{c}, c) := \|\hat{c} - c\|^2

Loss function in Smart Predict-Then-Optimize (SPO)

Also called End-to-End Predict-Then-Optimize2

\ell_{\mathrm{SPO}}(\hat{c}, c) := c^\top w^*(\hat{c}) - z^*(c)

After Predict, we use the predict result to optimize

w^*(\hat{c}) := \arg\max_{w \in \mathcal{S}} \hat{c}^\top w

Why SPO?

Geomitric illustration.jpg

Geometric Insight

Even if the predicted cost vectors $\hat{c}_A$ and $\hat{c}_B$ are equally close to the true cost $c$ in Euclidean distance:

\| \hat{c}_A - c \|_2 = \| \hat{c}_B - c \|_2

they may lead to different decisions:

w^*(\hat{c}_A) \ne w^*(\hat{c}_B)

which can result in different outcomes in the true cost:

c^\top w^*(\hat{c}_A) \ne c^\top w^*(\hat{c}_B)

This illustrates that minimizing the prediction error (e.g., using L2 loss) does not necessarily lead to optimal decisions — a core motivation for using the SPO framework.

Challenges in SPO

SPO Loss

\ell_{\mathrm{SPO}}(\hat{c}, c) := c^\top w^*(\hat{c}) - z^*(c)

Backward Propagation in SPO

SPO framework.png

To compute the gradient of the SPO loss $\ell$ with respect to the model parameters $\theta$, we apply the chain rule:

\frac{\partial \ell}{\partial \theta} =
\frac{\partial \ell}{\partial w^*(\hat{c})} \cdot
\frac{\partial w^*(\hat{c})}{\partial \hat{c}} \cdot
\frac{\partial \hat{c}}{\partial \theta}, \quad \hat{c} = f_\theta(x)

Here:
$\hat{c}$ is the predicted cost vector,
$f_\theta(x)$ is the prediction model parameterized by $\theta$,
$w^*(\hat{c})$ is the decision obtained by solving the optimization problem with $\hat{c}$.

Since $w^(\hat{c})$ is the solution to an optimization problem,
its derivative $\frac{\partial w^*(\hat{c})}{\partial \hat{c}}$ is often non-smooth or undefined, making the overall gradient computation challenging.

SPO → SPO+

Using the fact that:

\hat{c}^\top w = z^*(\hat{c}) \quad \text{for all } w \in W^*(\hat{c}),

we derive:

\ell_{\text{SPO}}(\hat{c}, c) 
= \max_{w \in W^*(\hat{c})} \left( c^\top w - z^*(c) \right) 
= \max_{w \in W^*(\hat{c})} \left( c^\top w - z^*(c) - \alpha \hat{c}^\top w + \alpha z^*(\hat{c}) \right) 
= \max_{w \in W^*(\hat{c})} \left( c^\top w - \alpha \hat{c}^\top w \right) + \alpha z^*(\hat{c}) - z^*(c)

The influence of $\alpha$ :

\begin{cases}
\alpha \uparrow & \Rightarrow \quad w^* \text{ leans toward predicted cost} \\
\alpha \downarrow & \Rightarrow \quad w^* \text{ leans toward true cost}
\end{cases}

Surrogate Upper Bound

\ell_{\mathrm{SPO}}(\hat{c}, c) \leq \inf_{\alpha} 
\left\{
\max_{w \in S} \left( c^\top w - \alpha \hat{c}^\top w \right) + \alpha z^*(\hat{c})
\right\} - z^*(c)
  • The feasible region of $w$ changes from $W^*$ to $S$
  • $\alpha$ does not influence the feasible region
  • As $\alpha \to \infty$, the bound is monotonically decreasing
    Because:
w \to w^*, \quad \hat{c}^\top w \to z^*(c)

SPO ERM Problem

Empirical Risk Minimization (ERM):

\min_{f \in \mathcal{H}} \frac{1}{n} \sum_{i=1}^n \ell_{\text{SPO}} \left( f(x_i), c_i \right)

SPO-based ERM Reformulation:

\min_{f \in \mathcal{H}} \frac{1}{n} \sum_{i=1}^n \lim_{\alpha_i \to \infty} \left\{
\max_{w \in S} \left( c_i^\top w - \alpha_i f(x_i)^\top w \right)
+ \alpha_i z^*(f(x_i)) - z^*(c_i)
\right\}
  • $f(x_i) = \hat{c}_i$
  • For all $c_i$ and $\hat{c}_i$, there exists $\alpha_i$ such that $w \to w^*$

SPO ERM Reformulation (Step-by-Step)

\min_{f \in \mathcal{H}} \frac{1}{n} \sum_{i=1}^n \lim_{\alpha_i \to \infty} \left[
\max_{w \in S} \left( c_i^\top w - \alpha_i f(x_i)^\top w \right) 
+ \alpha_i z^*(f(x_i)) - z^*(c_i)
\right]
= \min_{f \in \mathcal{H}} \frac{1}{n} \sum_{i=1}^n \lim_{\alpha_i \to \infty} \left[
\max_{w \in S} \left( c_i^\top w - \alpha_i f(x_i)^\top w \right) 
+ \alpha_i f(x_i)^\top w^*(\alpha_i f(x_i)) - z^*(c_i)
\right]
= \min_{f \in \mathcal{H}} \frac{1}{n} \lim_{\alpha \to \infty} \left(
\sum_{i=1}^n \max_{w \in S} \left( c_i^\top w - \alpha f(x_i)^\top w \right) 
+ \alpha f(x_i)^\top w^*(\alpha f(x_i)) - z^*(c_i)
\right)
\leq \min_{f \in \mathcal{H}} \frac{1}{n} \sum_{i=1}^n \max_{w \in S} \left(
c_i^\top w - 2 f(x_i)^\top w + 2 f(x_i)^\top w^*(2 f(x_i)) - z^*(c_i)
\right)
\leq \min_{f \in \mathcal{H}} \frac{1}{n} \sum_{i=1}^n \max_{w \in S} \left(
c_i^\top w - 2 f(x_i)^\top w + 2 f(x_i)^\top w^*(c_i) - z^*(c_i)
\right)

SPO+

Definition:

\text{SPO}^+ = \max_{w \in \mathcal{S}} (c - 2\hat{c})^\top w + 2\hat{c}^\top w^*(c) - z^*(c)

Where:

  • $\hat{c}$:Predicted cost vector
  • $c$:True cost vector
  • $w \in \mathcal{S}$:Any feasible decision in constraint set $\mathcal{S}$
  • $w^*(\hat{c})$:Optimal solution under predicted cost
  • $z^*(c)$:Optimal value under true cost
  • $\max_{w \in \mathcal{S}} (c - 2\hat{c})^\top w$:This is a linear programming problem

The Differentiability in SPO+

SPO+ Loss

The SPO+ loss:

\mathcal{L}_{\text{SPO+}}(\hat{c}) = \max_{w \in \mathcal{S}} (c - 2\hat{c})^\top w + 2\hat{c}^\top w^*(c) - c^\top w^*(c)

Let:

w^+ \in \arg\max_{w \in \mathcal{S}} (c - 2\hat{c})^\top w

Then one subgradient of the SPO+ loss is:

\nabla_{\hat{c}} \mathcal{L}_{\text{SPO+}}(\hat{c}) = -2w^+ + 2w^*(c)

Notes

  • The loss is convex, since it's a maximum over linear functions.
  • Hence, a subgradient exists at every point.
  • This makes the SPO+ loss amenable to subgradient descent and compatible with automatic differentiation in deep learning frameworks.
  1. Definition from: Adam N. Elmachtoub, Paul Grigas (2021). Smart “Predict, then Optimize”. Management Science, 68(1): 9–26.

  2. Kotary et al. (2023). Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?