This article is a Private article. Only a writer and users who know the URL can access it.
Please change open range to public in publish setting if you want to share this article with other users.

More than 5 years have passed since last update.

Stanによる混合ガンマ分布の推定

Stan

Posted at 2017-02-05

混合ガウス分布の推定はよくやられているけど、混合ガンマ分布の推定はちょっと難しいみたい。

https://ja.wikipedia.org/wiki/ガンマ分布

Stan で書いてみた。

gamma_mixture.stan

data {
  int N;
  real values[N];
  int cluster_num;
}
transformed data {
  real min_val = min(values);
  real max_val = max(values);
  real var_val = variance(values);
}
parameters {
  positive_ordered[cluster_num] mu;
  positive_ordered[cluster_num] k;
  simplex[cluster_num] ratio; 
}
transformed parameters {
  real<lower=min_val/var_val> rate[cluster_num];
  for (j in 1:cluster_num) {
    rate[j] = k[j] / mu[j];
  }
}
model {
  real loglikes[cluster_num];
  for(i in 1:N) {
    for (j in 1:cluster_num) {
      loglikes[j] = log(ratio[j]) + gamma_lpdf(values[i] | k[j], rate[j]);
    }
    target += log_sum_exp(loglikes);
  }
  for (j in 1:cluster_num) {
    mu[j] ~ uniform(min_val, max_val);
  }
}

一般の混合ガンマ分布の推定を書きたかったけど、難しいみたいなので次の制約を入れた。
M 個のガンマ分布に対して、それぞれの期待値を $\mu_i$ と表し、

$$
\mu_1 < \mu_2 < ... < \mu_M
$$

となるように番号を振る。このとき、形状パラメータ $k_i$ に対して

$$
k_1 < k_2 < ... < k_M
$$

が成り立つという制約を入れる。
これは尺度パラメータ $\theta_i$ の逆数 $r_i$ に対して次の制約を置いたものと等価である。

$$
r_1 < \frac{\mu_2}{\mu_1}r_2 < \frac{\mu_3}{\mu_1}r_3 < ... < \frac{\mu_M}{\mu_1}r_M
$$

したがって、この制約は $\theta_1 = \cdots = \theta_M$ を含む。

2 クラス推定

library(rstan)

N <- 1000

d1 <- rgamma(N, 1, 1)
d2 <- rgamma(N, 4, 1/2)
values <- c(d1, d2)

standata <- list(N = length(values), values = values, cluster_num = 2)

fit <- stan("gamma_mixture.stan", data = standata, chains = 3)

print(fit, pars=c("k", "rate"), digits_summary = 1)

結果

        mean se_mean  sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
k[1]     1.0       0 0.0  0.9 1.0 1.0 1.0   1.1  1906    1
k[2]     5.1       0 0.4  4.3 4.8 5.0 5.3   5.9  1515    1
rate[1]  1.0       0 0.1  0.8 1.0 1.0 1.1   1.2  1358    1
rate[2]  0.6       0 0.0  0.5 0.6 0.6 0.7   0.7  1856    1

3 クラス推定

library(rstan)

N <- 1000

d1 <- rgamma(N, 1, 1)
d2 <- rgamma(N, 4, 1/2)
d3 <- rgamma(N, 8, 1/5)
values <- c(d1, d2, d3)

standata <- list(N = length(values), values = values, cluster_num = 3)

fit <- stan("gamma_mixture.stan", data = standata, chains = 3)

print(fit, pars=c("k", "rate"), digits_summary = 1)

結果

        mean se_mean  sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
k[1]     1.0       0 0.1  0.9 0.9 1.0 1.0   1.1  1441    1
k[2]     4.7       0 0.6  3.6 4.3 4.7 5.1   5.9   938    1
k[3]     7.9       0 0.5  6.9 7.5 7.9 8.3   9.0  1947    1
rate[1]  0.9       0 0.1  0.7 0.8 0.9 1.0   1.1   998    1
rate[2]  0.6       0 0.1  0.4 0.5 0.6 0.6   0.7   986    1
rate[3]  0.2       0 0.0  0.2 0.2 0.2 0.2   0.2  2187    1

うまくいったみたい。
ちょっとずれているが、サンプルサイズ増やせばもっと正確に推定できるはず。

Enjoy!

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up