0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

stanで確率的主成分分析

Last updated at Posted at 2024-01-19

多変量時系列モデルの中で使う可能性があり、調べてコードがどこにもなかった(bingも)ので。
参考はビショップ本の下巻の終わりの方など。

YをG次元の確率変数(データ)、xを圧縮したい次元(例では1次元)のベクトル確率変数、$\epsilon$を正規ノイズとして

\textbf{Y}=W*\textbf{x}+\mu+\epsilon

というモデル。Y-$μ$をWの列ベクトルの線形結合で表現し、xやWを推定する。分散最大の軸を決めるという定義でなく、こう書かれると因子分析とよく似ているが、εの分散が共通なのがPCA。潜在変数が無相関(→covmatが対角行列)なのは同じ。

データの生成から、単純な1次元

N=300
x=rnorm(N)
y=0.8*x+3+2*rnorm(N)

qplot(x,y)

library(rstan)
lst=list(Y=tibble(x,y),
         N=length(x),G=2,D=1
)

1ce929f9-915c-47eb-8fd0-bdfcd2dc5fc9.png

以下、stanコード。多変量分布と行列の書き方で微妙に格闘。


data {
  int<lower = 1> N;
  int<lower = 1> D;
  int G;
  vector[G] Y[N];
}

transformed data{
  matrix[D, D] I;
}

parameters {
  vector[D] X[N];
  vector[G] mu;
  matrix[G,D] W;
  
  real<lower = 0> s;
}
model {
  for (n in 1:N){
    X[n] ~ multi_normal(rep_vector(0, D),diag_matrix(rep_vector(1, D)));
  }
  
  for (n in 1:N){
    Y[n] ~ multi_normal(W * X[n] + mu, diag_matrix(rep_vector(s,G)));
  }
}
fit=stan(model_code = scode,
         data=lst,
         warmup=300,iter=1000,
         chain=1)

X=get_posterior_mean(fit,"X")
mu=get_posterior_mean(fit,"mu")
W=get_posterior_mean(fit,"W")
Y=t(W%*%t(X))+matrix(mu,nrow=lst$N,ncol=2,byrow=TRUE)

qplot(x,y)+geom_point(aes(Y[,1],Y[,2],col="red"))


a44c5c19-b18d-4962-8c0f-b363bb996411.png

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?