Table of Contents
- Introduction
- Background
- Overview of the Multi-Dimensional Kalman Filter
- Derivation of the Multi-Dimensional Kalman Filter
- Integrating the Kalman Filter with STS Models
- Recursive Equations
- Covariance Matrix Update
- Algorithm Summary
- Conclusion
- References
1. Introduction
This article provides a self-contained derivation of integrating Structural Time Series (STS) models with the Multi-Dimensional Kalman Filter for time-series forecasting and state estimation.
This article strictly applies to linear-Gaussian state-space setups. Extensions (Extended KF, Unscented KF) exist for mild non-linearities but can be less robust and introduce approximation errors.On the other hand,
Bayesian STS can more flexibly incorporate non-linear or non-Gaussian components by placing priors on these features, sampling them with MCMC (though at higher computational cost).
2. Background
2.1. Structural Time Series (STS) Models
Structural Time Series (STS) models decompose time-series data into various structural components, making it easier to understand and forecast underlying patterns. The primary components typically include:
-
Trend Component
$(\ell_t)$ (Captures the long-term progression of the series.) -
Seasonal Components
$(s_{k,t})$ (Account for periodic fluctuations at different frequencies.) -
Regression Components
$(\boldsymbol{\beta}_t)$ (Incorporate the influence of external regressors.) -
Noise Components
(Represent unobserved shocks or irregularities.)
The STS model can be expressed in a state-space framework, facilitating the application of filtering and smoothing techniques such as the Kalman Filter.
2.2. Multi-Dimensional Kalman Filter
The Multi-Dimensional Kalman Filter extends the original Kalman Filter to handle multiple state variables and observations. It employs matrix and vector operations to efficiently estimate the state of systems with multiple interacting components.
-
State Vector
\begin{align} \mathbf{x}_t \quad (\text{dimension: } n \times 1) \end{align}
Represents $(n)$ state variables at time $(t)$.
-
State Transition Matrix
\begin{align} \mathbf{F}_t \quad (\text{dimension: } n \times n) \end{align}
Governs the evolution of the state vector.
-
Control-Input Matrix (optional)
\begin{align} \mathbf{B}_t \quad (\text{dimension: } n \times m) \end{align}
Maps a control input vector $(\mathbf{u}_t \in \mathbb{R}^{m})$ into the state space.
-
Observation Matrix
\begin{align} \mathbf{H}_t \quad (\text{dimension: } p \times n) \end{align}
Maps the state vector to the observation vector.
-
Process Noise Covariance
\begin{align} \mathbf{Q}_t \quad (\text{dimension: } n \times n) \end{align}
Captures the uncertainty in the state transition.
-
Measurement Noise Covariance
\begin{align} \mathbf{R}_t \quad (\text{dimension: } p \times p) \end{align}
Captures the uncertainty in observations.
-
Observation Vector
\begin{align} \mathbf{z}_t \quad (\text{dimension: } p \times 1) \end{align}
Represents the measured output at time $(t)$.
2.3. Assumptions
For the Kalman Filter and STS models to function optimally, the following assumptions are typically made:
- Linearity: The state transition and observation equations are linear.
- Gaussian Noise: Both process noise $(\mathbf{w}_t)$ and measurement noise $(\mathbf{v}_t)$ are Gaussian.
-
Initial State Gaussianity: The initial state
where $(\mathbf{\hat{x}}_0)$ is $(n \times 1)$ and $(\mathbf{P}_0)$ is of dimension $(n \times n)$.
\begin{align} \mathbf{x}_0 \sim \mathcal{N}(\mathbf{\hat{x}}_0, \mathbf{P}_0), \end{align}
- Independence: The noises $(\mathbf{w}_t)$ and $(\mathbf{v}_t)$ are independent of each other and across time steps.
3. Overview of the Multi-Dimensional Kalman Filter
The Kalman Filter operates in two main steps:
-
Prediction Step:
Estimates the current state and its covariance based on the previous state estimate. -
Update Step:
Refines the prediction using the new measurement.
Throughout the filtering process, we maintain two estimates:
-
Prior (Predicted) Estimate
\begin{align} \mathbf{\hat{x}}_{k|k-1} \quad (\text{dimension: } n \times 1) \end{align}
Estimate of the state $(\mathbf{x}_k)$ before incorporating the measurement at time $(k)$.
-
Posterior (Updated) Estimate
\begin{align} \mathbf{\hat{x}}_{k|k} \quad (\text{dimension: } n \times 1) \end{align}
Estimate of the state $(\mathbf{x}_k)$ after incorporating the measurement at time $(k)$.
We also maintain two covariance matrices:
-
Prior Covariance
\begin{align} \mathbf{P}_{k|k-1} \quad (\text{dimension: } n \times n) \end{align}
-
Posterior Covariance
\begin{align} \mathbf{P}_{k|k} \quad (\text{dimension: } n \times n) \end{align}
4. Derivation of the Multi-Dimensional Kalman Filter
4.1. Bayesian Formulation
At each time step $(k)$, the goal is to find the posterior distribution:
\begin{align}
p(\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k}),
\end{align}
where $(\mathbf{z}_{1:k} = {\mathbf{z}_1, \mathbf{z}_2, \dots, \mathbf{z}_k})$ represents all measurements up to time $(k)$.
We assume:
-
The state transition is
\begin{align} p(\mathbf{x}_k \,\mid\, \mathbf{x}_{k-1}) = \mathcal{N}(\mathbf{F}_k\,\mathbf{x}_{k-1} + \mathbf{B}_k\,\mathbf{u}_k, \, \mathbf{Q}_k). \end{align}
-
The measurement equation is
\begin{align} p(\mathbf{z}_k \,\mid\, \mathbf{x}_k) = \mathcal{N}(\mathbf{H}_k\,\mathbf{x}_k, \, \mathbf{R}_k); \end{align}
By applying Bayes' Theorem with these Gaussian assumptions, the posterior remains Gaussian at each step.
4.1.1. Simple Derivation of the Chapman–Kolmogorov Equation
In general Bayesian filtering (including the Kalman Filter), the Chapman–Kolmogorov equation underlies the prediction step:
\begin{align}
p(\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k-1})
= \int p(\mathbf{x}_k \,\mid\, \mathbf{x}_{k-1})
\,p(\mathbf{x}_{k-1} \,\mid\, \mathbf{z}_{1:k-1})
\, d\mathbf{x}_{k-1}.
\end{align}
The Chapman–Kolmogorov equation tells us how to marginalize out $(\mathbf{x}_{k-1})$ to find the prior distribution for $(\mathbf{x}_k)$. In the linear-Gaussian case, the result stays Gaussian with known mean and covariance formulas—exactly what leads to the Kalman Filter’s prediction equations.
4.2. Prediction Step
We want to calculate:
\begin{align}
p(\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k-1})
= \int
p(\mathbf{x}_k \,\mid\, \mathbf{x}_{k-1}) \,
p(\mathbf{x}_{k-1} \,\mid\, \mathbf{z}_{1:k-1})
\, d\mathbf{x}_{k-1}.
\end{align}
-
Predicting the Mean (State Estimate)
The predicted (prior) mean is:
\begin{align} \mathbf{\hat{x}}_{k|k-1} &= \mathbb{E}\bigl[\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k-1}\bigr]. \end{align}
Using the integral form:
\begin{align} \mathbf{\hat{x}}_{k|k-1} &= \int \mathbf{x}_k \, p(\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k-1}) \, d\mathbf{x}_k \\ &= \int \!\!\int \mathbf{x}_k\, p(\mathbf{x}_k \,\mid\, \mathbf{x}_{k-1}) p(\mathbf{x}_{k-1} \,\mid\, \mathbf{z}_{1:k-1}) \, d\mathbf{x}_{k-1} \, d\mathbf{x}_k. \end{align}
Because each distribution is Gaussian, we can use the conditional expectation property:
\begin{align} \mathbb{E}[\mathbf{x}_k \,\mid\, \mathbf{x}_{k-1}] = \mathbf{F}_k \,\mathbf{x}_{k-1} + \mathbf{B}_k \,\mathbf{u}_k. \end{align}
Hence:
\begin{align} \mathbf{\hat{x}}_{k|k-1} &= \mathbf{F}_k \,\mathbb{E}\bigl[\mathbf{x}_{k-1} \,\mid\, \mathbf{z}_{1:k-1}\bigr] + \mathbf{B}_k\,\mathbf{u}_k \\ &= \mathbf{F}_k \,\mathbf{\hat{x}}_{k-1|k-1} + \mathbf{B}_k\,\mathbf{u}_k. \end{align}
-
Predicting the Covariance
The prior covariance is:
\begin{align} \mathbf{P}_{k|k-1} &= \mathbb{E}\bigl[(\mathbf{x}_k - \mathbf{\hat{x}}_{k|k-1}) (\mathbf{x}_k - \mathbf{\hat{x}}_{k|k-1})^\top \,\mid\, \mathbf{z}_{1:k-1}\bigr]. \end{align}
4.2.1. Intuitive Derivation for the Covariance Prediction
Starting from the state-space model:
\begin{align} \mathbf{x}_k &= \mathbf{F}_k\,\mathbf{x}_{k-1} + \mathbf{B}_k\,\mathbf{u}_k + \mathbf{w}_k, \quad \mathbf{w}_k \sim \mathcal{N}(\mathbf{0}, \,\mathbf{Q}_k), \end{align}
and
\begin{align} \mathbf{\hat{x}}_{k|k-1} &= \mathbf{F}_k \,\mathbf{\hat{x}}_{k-1|k-1} + \mathbf{B}_k \,\mathbf{u}_k, \end{align}
we define:
\begin{align} \mathbf{P}_{k|k-1} \;=\; \mathbb{E}\bigl[(\mathbf{x}_k - \mathbf{\hat{x}}_{k|k-1}) (\mathbf{x}_k - \mathbf{\hat{x}}_{k|k-1})^\top \,\bigm|\, \mathbf{z}_{1:k-1}\bigr]. \end{align}
-
Rewrite :
\begin{align} \mathbf{x}_k - \mathbf{\hat{x}}_{k|k-1} &= \Bigl(\mathbf{F}_k\,\mathbf{x}_{k-1} + \mathbf{B}_k\,\mathbf{u}_k + \mathbf{w}_k\Bigr) \;-\; \Bigl(\mathbf{F}_k\,\mathbf{\hat{x}}_{k-1|k-1} + \mathbf{B}_k\,\mathbf{u}_k\Bigr) \\[6pt] &= \mathbf{F}_k\Bigl(\mathbf{x}_{k-1} - \mathbf{\hat{x}}_{k-1|k-1}\Bigr) + \mathbf{w}_k. \end{align}
-
Expand the expectation:
\begin{align} \mathbf{P}_{k|k-1} &= \mathbb{E}\Bigl[ \bigl(\mathbf{F}_k(\mathbf{x}_{k-1} - \mathbf{\hat{x}}_{k-1|k-1}) + \mathbf{w}_k \bigr) \bigl(\mathbf{F}_k(\mathbf{x}_{k-1} - \mathbf{\hat{x}}_{k-1|k-1}) + \mathbf{w}_k \bigr)^\top \Bigr]. \end{align}
This splits into four terms.
-
Use uncorrelated Gaussian noise $(\mathbf{w}_k)$:
\begin{align} \mathbb{E}[\mathbf{w}_k\,\mathbf{w}_k^\top] = \mathbf{Q}_k \\[4pt] \mathbb{E}[\mathbf{w}_k\,\mathbf{x}_{k-1}] = 0 \end{align}
Hence, we are left with:
\begin{align} \mathbf{P}_{k|k-1} &= \mathbf{F}_k \,\mathbb{E}\bigl[ (\mathbf{x}_{k-1} - \mathbf{\hat{x}}_{k-1|k-1}) (\mathbf{x}_{k-1} - \mathbf{\hat{x}}_{k-1|k-1})^\top \bigr]\, \mathbf{F}_k^\top \;+\; \mathbf{Q}_k. \end{align}
The bracketed expectation is simply $(\mathbf{P}_{k-1|k-1})$.
-
Hence, exploiting Gaussian properties and uncorrelated noise, we arrive at the well-known covariance prediction formula:
\begin{align}
\mathbf{P}_{k|k-1}
= \mathbf{F}_k \,\mathbf{P}_{k-1|k-1}\,\mathbf{F}_k^\top
+ \mathbf{Q}_k.
\end{align}
Putting it all together, the Prediction Step equations are:
\begin{align}
\mathbf{\hat{x}}_{k|k-1}
&= \mathbf{F}_k \,\mathbf{\hat{x}}_{k-1|k-1}
+ \mathbf{B}_k\,\mathbf{u}_k,\\[4pt]
\mathbf{P}_{k|k-1}
&= \mathbf{F}_k \,\mathbf{P}_{k-1|k-1}\,\mathbf{F}_k^\top
+ \mathbf{Q}_k.
\end{align}
4.3. Update Step
Now we incorporate the new measurement $(\mathbf{z}_k)$ to obtain
\begin{align}
p(\mathbf{x}_k \mid \mathbf{z}_{1:k})
\end{align}
-
Measurement Model
\begin{align} \mathbf{z}_k = \mathbf{H}_k \,\mathbf{x}_k + \mathbf{v}_k, \quad \mathbf{v}_k \sim \mathcal{N}(\mathbf{0},\,\mathbf{R}_k). \end{align}
-
Posterior Distribution (Bayes' Rule)
By definition,
\begin{align} p(\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k}) &= \frac{p(\mathbf{z}_k \,\mid\, \mathbf{x}_k)\, p(\mathbf{x}_k \,\mid\, \mathbf{z}_{1:k-1})} {p(\mathbf{z}_k \,\mid\, \mathbf{z}_{1:k-1})}. \end{align}
Each term here is Gaussian, so the resulting distribution remains Gaussian with a closed-form mean and covariance.
-
Kalman Gain
Let
\begin{align} \mathbf{y}_k &= \mathbf{z}_k - \mathbf{H}_k \,\mathbf{\hat{x}}_{k|k-1} \quad (\text{innovation or residual}). \end{align}
The covariance of $(\mathbf{y}_k)$ is
\begin{align} \mathbf{S}_k &= \mathbf{H}_k \,\mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top + \mathbf{R}_k. \end{align}
The Kalman Gain $(\mathbf{K}_k)$ (dimension $(n \times p)$) is:
\begin{align} \mathbf{K}_k &= \mathbf{P}_{k|k-1} \,\mathbf{H}_k^\top \,\mathbf{S}_k^{-1}. \end{align}
Equivalently,
\begin{align} \mathbf{K}_k &= \mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top \bigl(\mathbf{H}_k\,\mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top + \mathbf{R}_k\bigr)^{-1}. \end{align}
-
Posterior State Estimate
\begin{align} \mathbf{\hat{x}}_{k|k} &= \mathbf{\hat{x}}_{k|k-1} + \mathbf{K}_k \bigl(\mathbf{z}_k - \mathbf{H}_k\,\mathbf{\hat{x}}_{k|k-1}\bigr). \end{align}
-
Posterior Covariance
\begin{align} \mathbf{P}_{k|k} &= \bigl(\mathbf{I} - \mathbf{K}_k\,\mathbf{H}_k\bigr)\, \mathbf{P}_{k|k-1}. \end{align}
Alternatively, the numerically more stable Joseph Form:
\begin{align} \mathbf{P}_{k|k} &= \bigl(\mathbf{I} - \mathbf{K}_k\,\mathbf{H}_k\bigr)\, \mathbf{P}_{k|k-1}\, \bigl(\mathbf{I} - \mathbf{K}_k\,\mathbf{H}_k\bigr)^\top + \mathbf{K}_k \,\mathbf{R}_k \,\mathbf{K}_k^\top. \end{align}
Hence, the Update Step equations are:
\begin{align}
\mathbf{K}_k
&= \mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top
\bigl(\mathbf{H}_k\,\mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top + \mathbf{R}_k\bigr)^{-1}, \\[4pt]
\mathbf{\hat{x}}_{k|k}
&= \mathbf{\hat{x}}_{k|k-1}
+ \mathbf{K}_k\Bigl(\mathbf{z}_k
- \mathbf{H}_k\,\mathbf{\hat{x}}_{k|k-1}\Bigr), \\[4pt]
\mathbf{P}_{k|k}
&= \bigl(\mathbf{I} - \mathbf{K}_k\,\mathbf{H}_k\bigr)\,\mathbf{P}_{k|k-1}.
\end{align}
5. Integrating the Kalman Filter with STS Models
5.1. STS as State-Space Models
STS models can be framed within a state-space architecture, allowing the application of filtering techniques like the Kalman Filter for state estimation. Each structural component of the STS model (e.g., trend, seasonality, regression) corresponds to elements within the state vector.
Example State Vector:
\begin{align}
\mathbf{x}_t
= \begin{bmatrix}
\ell_t \\
s_{1,t} \\
s_{2,t} \\
\vdots \\
s_{K,t} \\
\boldsymbol{\beta}_t
\end{bmatrix},
\quad (\text{dimension: } n \times 1).
\end{align}
5.2. Applying the Kalman Filter to STS
Once the STS model is framed as a state-space model, the Kalman Filter can be directly applied to estimate the latent states and their uncertainties:
- State Estimation: Estimate latent components (trend, seasonality, regression effects) at each time step.
- Parameter Estimation: Infer model parameters, such as the variances of noise components.
- Forecasting: Predict future observations based on estimated states and forecasted control inputs (if any).
5.3. Example: Decomposing STS Components
Consider an STS model with a local-level trend ($(\ell_t)$) and a single seasonal component ($(s_t)$). Then:
\begin{align}
\mathbf{x}_t
&= \begin{bmatrix}
\ell_t \\
s_t
\end{bmatrix},
\quad (\text{dimension: } 2 \times 1), \\[4pt]
\mathbf{F}_t
&= \begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix},
\quad (\text{dimension: } 2 \times 2), \\[4pt]
\mathbf{Q}_t
&= \begin{bmatrix}
\sigma_{\ell}^2 & 0 \\
0 & \sigma_s^2
\end{bmatrix},
\quad (\text{dimension: } 2 \times 2), \\[4pt]
\mathbf{H}_t
&= \begin{bmatrix}
1 & 1
\end{bmatrix},
\quad (\text{dimension: } 1 \times 2), \\[4pt]
\mathbf{R}_t
&= \sigma_{\varepsilon}^2,
\quad (\text{dimension: } 1 \times 1).
\end{align}
Then the observation is:
\begin{align}
y_t = \ell_t + s_t + \varepsilon_t,
\quad
\varepsilon_t \sim \mathcal{N}(0,\sigma_{\varepsilon}^2).
\end{align}
Applying the Kalman Filter follows the same Prediction and Update steps shown above, allowing joint estimation of both trend and seasonal components.
6. Recursive Equations
Combining the Prediction and Update steps, the multi-dimensional Kalman Filter can be written in a concise recursive form for each time $(k)$:
\begin{align}
&\textbf{Prediction:}\\
&\quad \mathbf{\hat{x}}_{k|k-1}
= \mathbf{F}_k \,\mathbf{\hat{x}}_{k-1|k-1}
+ \mathbf{B}_k \,\mathbf{u}_k, \\[3pt]
&\quad \mathbf{P}_{k|k-1}
= \mathbf{F}_k \,\mathbf{P}_{k-1|k-1}\,\mathbf{F}_k^\top
+ \mathbf{Q}_k, \\[6pt]
&\textbf{Update:}\\
&\quad \mathbf{K}_k
= \mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top\,
\bigl(\mathbf{H}_k\,\mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top
+ \mathbf{R}_k\bigr)^{-1}, \\[3pt]
&\quad \mathbf{\hat{x}}_{k|k}
= \mathbf{\hat{x}}_{k|k-1}
+ \mathbf{K}_k\,\bigl(\mathbf{z}_k
- \mathbf{H}_k\,\mathbf{\hat{x}}_{k|k-1}\bigr), \\[3pt]
&\quad \mathbf{P}_{k|k}
= \bigl(\mathbf{I}
- \mathbf{K}_k\,\mathbf{H}_k\bigr)\,
\mathbf{P}_{k|k-1}.
\end{align}
7. Covariance Matrix Update
To ensure the uncertainty in the state estimate is updated correctly after each measurement, the covariance update step is crucial. As shown above, one can also use the Joseph Form for improved numerical stability:
\begin{align}
\mathbf{P}_{k|k}
&= \bigl(\mathbf{I}
- \mathbf{K}_k\,\mathbf{H}_k\bigr)\,
\mathbf{P}_{k|k-1}\,
\bigl(\mathbf{I}
- \mathbf{K}_k\,\mathbf{H}_k\bigr)^\top
+ \mathbf{K}_k\,\mathbf{R}_k\,\mathbf{K}_k^\top.
\end{align}
8. Algorithm Summary
-
Initialization
\begin{align} \mathbf{\hat{x}}_{0|0}, \quad \mathbf{P}_{0|0}. \end{align}
Here,
\begin{align} \mathbf{\hat{x}}_{0|0}\in \mathbb{R}^{n} , \\[3pt] \mathbf{P}_{0|0}\in \mathbb{R}^{n \times n} \end{align}
-
For each time step $( k = 1, 2, \dots, T )$:
-
Prediction Step
\begin{align} \mathbf{\hat{x}}_{k|k-1} &= \mathbf{F}_k\,\mathbf{\hat{x}}_{k-1|k-1} + \mathbf{B}_k\,\mathbf{u}_k, \\[4pt] \mathbf{P}_{k|k-1} &= \mathbf{F}_k\,\mathbf{P}_{k-1|k-1}\,\mathbf{F}_k^\top + \mathbf{Q}_k. \end{align}
-
Update Step
\begin{align} \mathbf{K}_k &= \mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top \bigl(\mathbf{H}_k\,\mathbf{P}_{k|k-1}\,\mathbf{H}_k^\top + \mathbf{R}_k\bigr)^{-1}, \\[4pt] \mathbf{\hat{x}}_{k|k} &= \mathbf{\hat{x}}_{k|k-1} + \mathbf{K}_k\bigl(\mathbf{z}_k - \mathbf{H}_k\,\mathbf{\hat{x}}_{k|k-1}\bigr), \\[4pt] \mathbf{P}_{k|k} &= \bigl(\mathbf{I} - \mathbf{K}_k\,\mathbf{H}_k\bigr)\, \mathbf{P}_{k|k-1}. \end{align}
-
-
Output
- $(\mathbf{\hat{x}}_{k|k}\in \mathbb{R}^{n})$: the optimal state estimate at time $(k)$.
- $(\mathbf{P}_{k|k}\in \mathbb{R}^{n \times n})$: the associated uncertainty.
10. References
- Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D), 35–45.
- Brown, R. G., & Hwang, P.-Y. (1997). Introduction to Random Signals and Applied Kalman Filtering. Wiley-Interscience.
- Simon, D. (2006). Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley-Interscience.
- Ortega, J. M., & Rheinboldt, H. P. (1970). The Kalman Filter: A Tutorial Review of Theory and Applications. Proceedings of the IEEE, 58(6), 823–839.
- Chung, T. S. (2011). An Introduction to the Kalman Filter. ITU Journal: ICT Discoveries, 1(1), 39–48.
- Harvey, A. C. (1990). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
- Gelman, A., et al. (2013). Bayesian Data Analysis (3rd ed.). CRC Press.