More than 5 years have passed since last update.

隠れマルコフモデルの勉強：『続わかりやすいパターン認識』第8章 - 2 -

Last updated at 2015-06-03Posted at 2015-06-03

モデル
P(x) の計算
状態変数の推計
パラメータの推計

P(x) を計算する

すべてのパラメータがわかっている場合に、ある出力列 $x^n$ が得られる確率を求めます。パラメータがすべてわかっているので、同時確率 $\mathbb{P}(x^n, s^n)$ の計算は簡単です（７章参照）。$\mathbb{P}(x^n)$ を求めるには、これをあり得るすべての $s^n$ について足し上げる、つまり周辺化すればいい。

\mathbb{P}(x^n) = \sum_{s^n} \mathbb{P}(x^n, s^n)

式に書くと簡単ですが、「ありうる全ての $s^n$」というのがたくさんあるのが問題になります。各期に $c$ 通りの場合があってそれが $n$ 期あるので、全部で $c^n$ 通りの列があります。時系列が短いうちはいいですが、長くなるにつれて計算量が指数的に増えてしまいます。
そこで、再帰的な方法によって効率的に計算することを考えます。

前向きアルゴリズム

$\alpha_t(s_t) = \mathbb{P}(x^t, s_t)$ という関数を考えます。出力には $t$ 期までの列を、状態変数は $t$ 期だけを抜き出しています。もしこの関数がすべての $t$ についてわかっているなら、求めたい確率は次の式で求められます。

\mathbb{P}(x^n) = \sum_{s_n} \mathbb{P}(x^n, s_n) = \sum_{s_n} \alpha_n(s_n)

ここで、１つ目の等号は全確率の法則によります。そこで、$\alpha_t(s_t)$ を求めることにします。漸化式の要領で、 $\alpha_t$ を $\alpha_{t-1}$ で表すように変形していきます。

\begin{align*}
\alpha_t(s_t) &= \mathbb{P}(x^t, s_t) \\
&= \sum_{s_{t-1}} \mathbb{P}(x^t, s_t | s_{t-1}) \cdot \mathbb{P}(s_{t-1})
&& \text{（全確率の法則）}\\
&= \sum_{s_{t-1}} \mathbb{P}(x_t, s_t | s_{t-1}, x^{t-1}) \cdot \mathbb{P}(x^{t-1} | s_{t-1}) \cdot\mathbb{P}(s_{t-1}) \\
&= \sum_{s_{t-1}} \mathbb{P}(x_t, s_t | s_{t-1}, x^{t-1}) \cdot \mathbb{P}(x^{t-1}, s_{t-1})  && \text{（条件付き確率の定義）}\\
&= \sum_{s_{t-1}} \mathbb{P}(x_t, s_t | s_{t-1}) \cdot \alpha_{t-1} (s_{t-1})\\
&= \sum_{s_{t-1}} \mathbb{P}(x_t |s_t, s_{t-1}) \cdot \mathbb{P}(s_t | s_{t-1}) \cdot \alpha_{t-1} (s_{t-1}) \\
&= \sum_{s_{t-1}} \mathbb{P}(x_t |s_t) \cdot \mathbb{P}(s_t | s_{t-1}) \cdot \alpha_{t-1} (s_{t-1}) && \text{（条件付き独立の仮定）}\\
&= b(s_t, x_t) \cdot \sum_{s_{t-1}} a(s_{t-1}, s_t) \cdot \alpha_{t-1} (s_{t-1})
\end{align*}

また、$t = 1$ においては、　$\alpha_1(s_1) = \mathbb{P}(x_1, s_1) = \mathbb{P}(x_1|s_1) \mathbb{P}(s_1) = \rho(s_1)\cdot b(s_1, x_1)$ となり、これが初期条件となります。まとめると

\begin{align*}
\alpha_1(s_1) &= \rho(s_1)\cdot b(s_1, x_1) \\
\alpha_t(s_t) &= b(s_t, x_t) \cdot \sum_{s_{t-1}} a(s_{t-1}, s_t) \cdot \alpha_{t-1} (s_{t-1}) \;\;\;\;\text{for $t > 1$}

\end{align*}

後ろ向きアルゴリズム

代替案として、$\beta_t(s_t) = \mathbb{P}(x^{t+1, n}| s_t)$ を考えます。ここで、$x^{t_1, t_2}$ は、$t_1$期から $t_2$ 期までの列、つまり $x^{t+1, n} = (x_{t+1}, x_{t_2}, \dots, x_n)$ を表すこととします。直接は使わないですが、 $\alpha_t(s_t) \cdot \beta_t(s_t) = \mathbb{P}(x^n, s_t)$ という関係があります。

$\beta_t(s_t)$ を求める逐次的な方法を考えます。この場合、$\beta_{t-1}$ を $\beta_{t}$ で表し、時間を遡っていく式が有用です。

\begin{align*}
\beta_{t-1}(s_{t-1}) &= \mathbb{P}(x^{t, n}| s_{t-1}) \\
&= \sum_{s_{t}} \mathbb{P}(x^{t, n}| s_{t-1}, s_{t}) \cdot\mathbb{P}
(s_{t} | s_{t-1}) && \text{（全確率の法則）}\\
&= \sum_{s_{t}} \mathbb{P}(x_t| s_{t-1}, s_{t})\cdot \mathbb{P}(x^{t+1, n}| x_t, s_{t-1}, s_{t})\cdot \mathbb{P}
(s_{t+1} | s_t) \\
&= \sum_{s_{t}} \mathbb{P}(x_t| s_{t})\cdot \mathbb{P}(x^{t+1, n}| s_t)\cdot \mathbb{P}
(s_{t} | s_{t-1}) && \text{（条件付き独立の仮定）}\\
&= \sum_{s_t} a(s_{t-1}, s_t)\cdot b(s_t, x_t)\cdot\beta_{t}(s_t)
\end{align*}

初期条件としては、

\begin{align*}
\beta_{n-1}(s_{n-1}) &= \mathbb{P}(x_n | s_{n-1})\\
&= \sum_{s_{n}} \mathbb{P}(x_n | s_n)\cdot \mathbb{P} (s_n | s_{n-1})\\ 
&= \sum_{s_n} a(s_{n-1}, s_n) \cdot b(s_n, x_n)
\end{align*}

で求められます。ところが、これを上の式と見比べると、$\beta_n(s_n) = 1$ としても同じことになることがわかります。形式的には、$\beta_n(s_n) = \mathbb{P}( . | s_n)$ と、対応する $x$ の列が空になりますが、便宜上これを確率１とおくと座りが良いようです。
まとめると、

\begin{align*}
\beta_n(s_n) &= 1 \\
\beta_{t-1} (s_{t-1}) &=  \sum_{s_t} a(s_{t-1}, s_t)\cdot b(s_t, x_t)\cdot\beta_{t}(s_t) \;\;\;\;\text{for $t \le n$}
\end{align*}

最後に、$\beta_t$ を使って $\mathbb{P}(x^n)$ を計算する方法を考えます。

\begin{align*}
\mathbb{P}(x^n) &= \sum_{s_1}\mathbb{P}(x_1, x^{2, n}|s_1)\cdot\mathbb{P}(s_1) \\
&= \sum_{s_1} \mathbb{P}(x^{2,n} | s_1, x_1) \cdot \mathbb{P}(x_1|s_1)\cdot\mathbb{P}(s_1)\\
&= \sum_{s_1} \mathbb{P}(x^{2,n} | s_1) \cdot \mathbb{P}(x_1|s_1)\cdot\mathbb{P}(s_1)\\
&= \sum_{s_1} \rho(s_1) \cdot \beta_1(s_1)\cdot b(s_1, x_1) 
\end{align*}

したがって、$\beta_n$ から逐次的な計算により $\beta_1$ を求め、最後の式に当てはめることで目的の確率を得ることができます。

シミュレーション

本の157頁にならって、下記のパラメータを仮定します。

\begin{align*}
A &= \begin{pmatrix}
     0.1 & 0.7 & 0.2 \\
     0.2 & 0.1 & 0.7 \\
     0.7 & 0.2 & 0.1
    \end{pmatrix}\\
B &= \begin{pmatrix}
     0.9 & 0.1 \\
     0.6 & 0.4 \\
     0.1 & 0.9
    \end{pmatrix}\\
\rho &= \left(\frac{1}{3}, \frac{1}{3}, \frac{1}{3} \right)'
\end{align*}

次の関数は、総当り計算によって　$P(x^n)$　を計算します(もっと効率的に書ける気もしますが)。

総当り法

P1 <- function(x, A, B, rho) {
    # computes P(x) = sum_s P(x, s)
    N <- length(x)  # size of data
    J <- nrow(B)    # number of possible state

    out <- 0
    s <- rep_len(1, N)
    v <- 0
    while (v < J^N) {
        prob_s <- 1    # P(s)
        prob_xs <- 1   # P(x|s)
        for (n in 1:N) {
            if (n == 1) {
                prob_s <- rho[s[1]]
            } else {
                prob_s <- prob_s * A[s[n-1], s[n]]
            }
            prob_xs <- prob_xs * B[s[n], x[n]]
        }
        p <- prob_s * prob_xs
        out <- out + p

        # next s
        s[1] <- s[1] + 1
        for (n in 2:N)
            if (s[n-1] > J) {
                s[n] <- s[n] + 1
                s[n-1] <- 1
            }
        v <- v + 1
    }
    return(out)
}

前向き・後ろ向きアルゴリズムです。

前向きアルゴリズム

P2 <- function(x, A, B, rho) {
    # forward algorithm
    N <- length(x)  # size of data
    alpha <- rho * B[, x[1]]  # alpha1(i) = rho_i * b(i, x_1)
    for (n in 2:N)
        alpha <- (matrix(alpha, nrow=1, ncol=length(alpha)) %*% A) * B[, x[n]]
    sum(alpha)
}

後ろ向きアルゴリズム

P3 <- function(x, A, B, rho) {
    # backward algorithm
    N <- length(x)  # size of data
    beta <- 1       # beta_n
    for (n in N:2) {
        # compute upto beta_1
        beta <- A %*% matrix(B[, x[n]] * beta, ncol=1, nrow=nrow(B))
    }
    sum( rho * beta * B[, x[1]] )
}

前向きアルゴリズムをRcppを使って書いた場合です。

前向きアルゴリズム（Rcpp）

library(Rcpp)
cppFunction(
"double P4(NumericVector x, NumericMatrix A, NumericMatrix B, NumericVector rho) {
    int N = x.size();
    int J = A.nrow();
    NumericVector alpha = rho * B(_, x[0]-1);
    NumericVector AA(J);
    for (int n = 1; n < N; ++n) {
        for (int j = 0; j < J; ++j) {
            AA[j] = sum(alpha * A(_, j));
        }
        alpha = AA * B(_, x[n]-1);
    }
    return sum(alpha);
}")

本に書いてあるのと同じ結果（159頁）が得られます。

x <- c(1, 2, 1)
P1(x, A, B, rho)
P2(x, A, B, rho)
P3(x, A, B, rho)
P4(x, A, B, rho)
# [1] 0.1733733

前向き・後ろ向きアルゴリズムはほぼ同じ速度。Rcppはさすがに速いです。

library(microbenchmark)
x <- floor(runif(6)*ncol(B))+1
microbenchmark(P1(x, A, B, rho), P2(x, A, B, rho), P3(x, A, B, rho), P4(x, A, B, rho))
# Unit: microseconds
#             expr       min        lq        mean     median         uq       max neval
# P1(x, A, B, rho) 18255.558 19512.578 20393.38242 20233.7780 21048.2000 24471.599   100
# P2(x, A, B, rho)    26.514    30.790    44.14065    34.6385    59.0130   110.757   100
# P3(x, A, B, rho)    31.218    35.494    53.88204    45.1155    67.3525   132.138   100
# P4(x, A, B, rho)     2.994     4.277    12.04699    12.1885    19.6710    38.487   100

データの長さを10000にしてもなかなかの速度でした。

x <- floor(runif(1e4)*ncol(B))+1
microbenchmark(P2(x, A, B, rho), P3(x, A, B, rho), P4(x, A, B, rho))
# Unit: milliseconds
#             expr       min        lq      mean    median        uq        max neval
# P2(x, A, B, rho) 49.685564 52.302878 56.122484 55.090603 57.895860 102.706219   100
# P3(x, A, B, rho) 55.790848 59.025019 62.427306 62.561312 65.007361  74.874726   100
# P4(x, A, B, rho)  1.211906  1.285031  1.397712  1.324801  1.444965   1.961543   100

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up