Q.変数の中にある数字を使ってグループ化したい

Last updated at 2021-12-04Posted at 2021-12-04

はじめに

具体的な問題としては，Stanの出力結果で配列化したパラメータをRで受け取った時の問題です。

配列化されたstanfitオブジェクトは次のように出力されます。

> fit
   variable    mean  median    sd   mad      q5     q95 rhat ess_bulk ess_tail
 lp__       -492.44 -492.03 11.36 11.31 -511.69 -474.75 1.00      760     1384
 theta[1,1]    0.34    0.29  0.24  0.26    0.02    0.78 1.00     6872     2172
 theta[2,1]    0.33    0.28  0.24  0.27    0.02    0.78 1.00     7548     2512
 theta[3,1]    0.34    0.30  0.23  0.27    0.02    0.77 1.00     6163     2511
 theta[4,1]    0.33    0.29  0.24  0.27    0.02    0.78 1.00     6515     2574
 theta[5,1]    0.33    0.30  0.24  0.27    0.03    0.78 1.00     5928     2481
 theta[6,1]    0.34    0.30  0.24  0.27    0.02    0.79 1.00     7291     1701
 theta[7,1]    0.45    0.43  0.28  0.35    0.04    0.93 1.00     3659     2281
 theta[8,1]    0.59    0.63  0.27  0.32    0.09    0.97 1.00     4358     2337
 theta[9,1]    0.59    0.62  0.27  0.33    0.10    0.97 1.00     4639     2536

これをextractして,tibble型にしてもこんな感じです。

> df <- fit.stanfit %>% rstan::extract() %>% as.data.frame %>% as_tibble
> df
# A tibble: 4,000 × 1,013
   theta.1.1 theta.2.1 theta.3.1 theta.4.1 theta.5.1 theta.6.1 theta.7.1 theta.8.1 theta.9.1 theta.10.1 theta.11.1 theta.12.1
       <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>      <dbl>      <dbl>      <dbl>
 1    0.120     0.0796    0.584     0.264    0.00837    0.592     0.484     0.279      0.530      0.814      0.969      0.510
 2    0.293     0.309     0.0590    0.0541   0.0484     0.0178    0.183     0.163      0.292      0.979      0.949      0.633
 3    0.406     0.200     0.0668    0.158    0.244      0.433     0.446     0.0483     0.262      0.179      0.954      0.146
 4    0.133     0.583     0.216     0.480    0.400      0.0277    0.749     0.499      0.982      0.288      0.708      0.145
 5    0.0257    0.242     0.384     0.0892   0.0582     0.261     0.904     0.636      0.868      0.965      0.624      0.974
 6    0.382     0.575     0.362     0.585    0.0867     0.138     0.0604    0.0633     0.192      0.678      0.614      0.513
 7    0.881     0.372     0.476     0.575    0.607      0.582     0.183     0.647      0.860      0.749      0.558      0.967
 8    0.371     0.166     0.304     0.289    0.440      0.499     0.490     0.746      0.621      0.575      0.308      0.903
 9    0.642     0.356     0.241     0.239    0.0686     0.336     0.128     0.738      0.590      0.984      0.813      0.817

これを縦長にして，なんとかここまできたとします。

> df %>% rowid_to_column("iter") %>% pivot_longer(-iter)
# A tibble: 4,052,000 × 3
    iter name         value
   <int> <chr>        <dbl>
 1     1 theta.1.1  0.120  
 2     1 theta.2.1  0.0796 
 3     1 theta.3.1  0.584  
 4     1 theta.4.1  0.264  
 5     1 theta.5.1  0.00837
 6     1 theta.6.1  0.592  
 7     1 theta.7.1  0.484  
 8     1 theta.8.1  0.279  
 9     1 theta.9.1  0.530  
10     1 theta.10.1 0.814

あとは名前でgroup_byしてsummariseすればEAPとかは出るんですが・・・

> df %>% rowid_to_column("iter") %>% pivot_longer(-iter) %>% 
+   group_by(name) %>% summarise(EAP = mean(value))
# A tibble: 1,013 × 2
   name         EAP
   <chr>      <dbl>
 1 lp__     -492.  
 2 lp.1.1    -12.6 
 3 lp.1.2     -1.82
 4 lp.10.1    -1.50
 5 lp.10.2    -5.68
 6 lp.100.1   -1.14
 7 lp.100.2  -10.3 
 8 lp.101.1   -1.25
 9 lp.101.2  -11.0 
10 lp.102.1   -1.37
# … with 1,003 more rows

ここでふと，「あ，ここから配列[i,j]を配列名，i, jの変数名にしたいな，と。
それが今日のお題。

Q.変数の中にある数字を使ってグループ化したい

次のデータフレームsampleを例にします。

sample <- data.frame(
  name = c("theta.1.1","gamma.1.2","theta.2.1","lambda.2.2"),
  value = c(0.120,0.0796,0.584,0.264)
)

これの中身。

> sample
        name  value
1  theta.1.1 0.1200
2  gamma.1.2 0.0796
3  theta.2.1 0.5840
4 lambda.2.2 0.2640

目標はこちら。

        name  value variables val1 val2
1  theta.1.1 0.1200     theta    1    1
2  gamma.1.2 0.0796     gamma    1    2
3  theta.2.1 0.5840     theta    2    1
4 lambda.2.2 0.2640    lambda    2    2

さあ，どうしましょう。
解答編はこちら。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up