階層VARモデルの実装

Posted at 2025-03-09

はじめに

こんにちは。事業会社で働いている新卒データサイエンティストです。

今回は、ベクトル自己回帰モデル（VAR）の階層化を考えます。VARにおいてサンプルを複数のグループに分けることができるとき、それらをグループごとに推定するのではなく、階層ベイズの考え方で組み込むことを目指します。元のアイデアはhttps://rpubs.com/jimsavage/hierarchical_var です。

VAR(p)モデル

$Y_t = (Y_{t1}, Y_{t2}, \cdots, Y_{tq})'$は時点$t$につき$q$個の変数を含むとする。このとき、基本的なVAR($p$)モデルは以下で表される。

$Y_t = \beta_1 Y_{t-1} + \cdots + \beta_pY_{t-p} + u_t, \quad -①$

$where\quad u_t \sim i.i.d.N(0,\Sigma).$

ただし、

$\beta_1, \cdots, \beta_p$は$q\times q$の行列
$\Sigma$は$q\times q$の分散共分散行列

である。

よりコンパクトに書けば、

$Y_t = BX_{t} + u_t, \quad -②$

$where\quad u_t \sim i.i.d.N(0,\Sigma).$

ただし、

$B=(B_1, B_2, \cdots, B_p)$は$q\times qp$の行列
$X_t=(Y_{t-1}^{'}, \cdots, Y_{t-p}^{'})$は$qp$次元

である。

階層化

いま、サンプルが複数のグループに分けられるとする。グループが$g$個存在するとき、$②$は以下のように書き換えられる。

$Y_{g,t} = BX_{g,t} + u_{g,t}, \quad -②'$

$where\quad u_{g,t} \sim multi\space normal(0,\Sigma).$

ただし、

$\Sigma = diag(\tau_g)\cdot \Omega_g \cdot diag(\tau_g)$
$\Omega_g = \rho\Omega_{global} + (1-\rho)\Omega_{g,local}, \quad where \space \rho \in (0,1).$

である。

実装（p=1）

一般的なwebサービスを考える。100日間のデータがあり、アプリのダウンロード数とWebの訪問者数が売上に与える影響に興味があるとしよう。サンプルを5つの年齢層（18-24歳、25-34歳、35-44歳、45歳-54歳, 55歳以上）に分け、これを階層化する。

簡単のため、$p=1$のときを考える。

まずはデータを生成。

set.seed(123) 

n_days <- 100
age_groups <- c("18-24", "25-34", "35-44", "45-54", "55+")

data <- expand.grid(
  date = seq(Sys.Date() - n_days + 1, Sys.Date(), by = "day"),
  age_group = age_groups
) |>
  dplyr::mutate(
    downloads = rpois(dplyr::n(), lambda = sample(50:200, dplyr::n(), replace = TRUE)),
    visitors = downloads * sample(3:7, dplyr::n(), replace = TRUE),
    sales = round(downloads * runif(dplyr::n(), min = 0.1, max = 0.5))
  ) |>
  dplyr::arrange(date) |>
  dplyr::mutate(time = rep(1:(dplyr::n()/5), each = 5)) |>
  dplyr::mutate(age_groups_num = rep(1:5, length.out = dplyr::n()))

次に、Stanコードを書く。

data {
  int time_type;
  int kpi_type;
  int group_type; 
  int<lower=1, upper=time_type> time[time_type]; 
  int<lower=1, upper=group_type> group[time_type];
  matrix[time_type, kpi_type] kpi;
}

parameters {
  // individual-level parameters
  corr_matrix[kpi_type] Omega_local[group_type]; 
  vector<lower = 0>[kpi_type] tau[group_type]; 
  
  matrix[kpi_type, kpi_type] z_beta[group_type];
  vector[kpi_type] z_intercept[group_type];
  
  // hierarchical priors (individual parameters distributed around these)
  real<lower = 0, upper = 1> rho;
  corr_matrix[kpi_type] Omega_global;
  
  vector[kpi_type] tau_location;
  vector<lower =0>[kpi_type] tau_scale;
  
  matrix[kpi_type, kpi_type] beta_location;
  matrix<lower = 0>[kpi_type, kpi_type] beta_scale;
  
  vector[kpi_type] intercept_location;
  vector<lower = 0>[kpi_type] intercept_scale;
}

transformed parameters {
  matrix[kpi_type, kpi_type] beta[group_type]; 
  vector[kpi_type] intercept[group_type];
  
  corr_matrix[kpi_type] Omega[group_type];
  
  for(g in 1:group_type) {
    intercept[g] = intercept_location + intercept_scale .*z_intercept[g];
    beta[g] = beta_location + beta_scale .*z_beta[g];
    Omega[g] = rho*Omega_global + (1-rho)*Omega_local[g];
  }
    
}
model {
  // hyperpriors
  rho ~ beta(2, 2);
  tau_location ~ cauchy(0, 1);
  tau_scale ~ cauchy(0, 1);
  intercept_location ~ normal(0, 1);
  intercept_scale ~ cauchy(0, 1);
  to_vector(beta_location) ~ normal(0, .5);
  to_vector(beta_scale) ~ cauchy(0, .5);
  Omega_global ~ lkj_corr(1);

  
  // hierarchical priors
  for(g in 1:group_type) {
    // non-centered parameterization
    z_intercept[g] ~ normal(0, 1);
    to_vector(z_beta[g]) ~ normal(0, 1);
    tau[g] ~ normal(tau_location, tau_scale);
    Omega_local[g] ~ lkj_corr(10);
  }
  
  // likelihood
  for(t in 1:time_type) {
    if(time[t]>1) {
      kpi[t] ~ multi_normal(intercept[group[t]] + beta[group[t]]*kpi[t-1]', 
                          quad_form_diag(Omega[group[t]], tau[group[t]]));
    }
  }
}

続いて事後分布の推定。変分ベイズで近似しています。

stan_data <- list(
  time_type = nrow(data),
  kpi_type = 3,
  group_type = 5,
  max_lag = 4,
  time = data$time,
  group = data$age_groups_num,
  kpi = data[,3:5]
)

model <- "Hierarchical_VAR.stan"
stan_model <- rstan::stan_model(file = "Hierarchical_VAR.stan")
fit_vb <- rstan::vb(stan_model, data = stan_data, iter = 10000, tol_rel_obj = 1e-3)

事後分布のサンプルを取り出す。

posterior_samples <- rstan::extract(fit_vb)

beta_df <- tibble::tibble(
  beta_mean = asplit(posterior_samples$beta, 2) |> 
    purrr::map_dbl(mean),
  beta_lower = asplit(posterior_samples$beta, 2) |> 
    purrr::map_dbl(~quantile(., probs = 0.05)),
  beta_upper = asplit(posterior_samples$beta, 2) |> 
    purrr::map_dbl(~quantile(., probs = 0.95)),
) |>
  dplyr::mutate(age_group = c("18-24", "25-34", "35-44", "45-54", "55+"))

head(beta_df)

各年齢層について、係数の90%信用区間を取り出した。

# A tibble: 5 × 4
  beta_mean beta_lower beta_upper age_group
      <dbl>      <dbl>      <dbl> <chr>    
1     0.399   -0.00228       1.29 18-24    
2     0.346    0.00143       1.22 25-34    
3     0.373   -0.0212        1.23 35-44    
4     0.351   -0.0349        1.22 45-54    
5     0.435   -0.0346        1.30 55+

おわりに

今回は$p=1$のケースを扱いました。本Stanコードでは、ラグが増えるごとにtransformed parametersブロック内の係数$\beta$が増えてしまうことが難点ですが…それは今後の課題にしたいと思います。

参考文献

https://rpubs.com/jimsavage/hierarchical_var
(2025年3月9日最終取得)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up