More than 3 years have passed since last update.

glmとかglmerとかの結果を表形式で表示&CSVで出力 (easystats)

Last updated at 2020-05-25Posted at 2020-05-02

こちらのツイートで紹介されていてとても便利そうだったのでメモ。
公式サイトはこちらです　→　https://easystats.github.io/report/
※2020年5月9日追記：実はCSVで出力できることに気づいたので加筆しました。

glm等の解析結果を確認したいときにはprintやsummaryを用いますが、これらで出力される解析結果の要約は読みにくい形式になっています。
easystatsのreportパッケージを使うと読みやすい表形式や引用しやすい文章形式で要約を確認できて非常に便利です。
またその表をCSV形式で保存できるので超便利です。

現時点ではデータフレームに加え、cor.test、t.test、glm、lme4::merMod等の解析結果に対して使用することができます。

ただし要約時にいくつかの統計量（R2等）をperformance::performanceで別途計算しているようなので、計算の導出過程を知りたい方はソースコードやそちらのパッケージを参照するのがよいと思います。

インストール

remotes::install_github("easystats/report")

remotesが入っていない場合はインストールした上で上記を実行。

使用例

reportパッケージではreport::report関数で要約を出力し、report::table_short、report::table_longで結果を表形式に整形します。また、report::text_short、report::text_longで文章形式に整形できます。
個人的には表形式がかなりおすすめなのでここではreport::table_short、report::table_longの使用例を示したいと思います。

またコードについては各パッケージのサンプルコードを参考にしました。

データフレーム

`table_short`

> iris %>% report() %>% table_short()
Variable     |      Level | n_Obs | Mean |   SD |  Min |  Max | n_Missing
-------------------------------------------------------------------------
Sepal.Length |            |   150 | 5.84 | 0.83 | 4.30 | 7.90 |         0
Sepal.Width  |            |   150 | 3.06 | 0.44 | 2.00 | 4.40 |         0
Petal.Length |            |   150 | 3.76 | 1.77 | 1.00 | 6.90 |         0
Petal.Width  |            |   150 | 1.20 | 0.76 | 0.10 | 2.50 |         0
Species      |     setosa |    50 |      |      |      |      |          
Species      | versicolor |    50 |      |      |      |      |          
Species      |  virginica |    50 |      |      |      |      |

`table_long`

> iris %>% report() %>% table_long()
Variable     |      Level | n_Obs | Mean |   SD | Median |  MAD |  Min |  Max | Skewness | Kurtosis | n_Missing | percentage_Obs
--------------------------------------------------------------------------------------------------------------------------------
Sepal.Length |            |   150 | 5.84 | 0.83 |   5.80 | 1.04 | 4.30 | 7.90 |     0.31 |    -0.57 |         0 |               
Sepal.Width  |            |   150 | 3.06 | 0.44 |   3.00 | 0.44 | 2.00 | 4.40 |     0.32 |     0.18 |         0 |               
Petal.Length |            |   150 | 3.76 | 1.77 |   4.35 | 1.85 | 1.00 | 6.90 |    -0.27 |    -1.40 |         0 |               
Petal.Width  |            |   150 | 1.20 | 0.76 |   1.30 | 1.04 | 0.10 | 2.50 |    -0.10 |    -1.34 |         0 |               
Species      |     setosa |    50 |      |      |        |      |      |      |          |          |           |          33.33
Species      | versicolor |    50 |      |      |        |      |      |      |          |          |           |          33.33
Species      |  virginica |    50 |      |      |        |      |      |      |          |          |           |          33.33

各種統計量が見やすく確認できますね。

`cor.test`

`table_short`

> cor.test(iris$Sepal.Length, iris$Sepal.Width) %>% report() %>% table_short()
Parameter1        |       Parameter2 |     r |     t |  df |    p | CI_low | CI_high
------------------------------------------------------------------------------------
iris$Sepal.Length | iris$Sepal.Width | -0.12 | -1.44 | 148 | 0.15 |  -0.27 |    0.04

`table_long`

> cor.test(iris$Sepal.Length, iris$Sepal.Width) %>% report() %>% table_long()
Parameter1        |       Parameter2 |     r |     t |  df |    p | CI_low | CI_high |  Method
----------------------------------------------------------------------------------------------
iris$Sepal.Length | iris$Sepal.Width | -0.12 | -1.44 | 148 | 0.15 |  -0.27 |    0.04 | Pearson

結果は1行にまとまっていますが表形式になっているので見やすいです。

`t.test`

`table_short`

> t.test(mtcars$mpg ~ mtcars$am) %>% report() %>% table_short()
Difference |     t |    df |    p | CI_low | CI_high | Cohens_d
---------------------------------------------------------------
      7.24 | -3.77 | 18.33 | 0.00 | -11.28 |   -3.21 |    -1.76

`table_long`

> t.test(mtcars$mpg ~ mtcars$am) %>% report() %>% table_long()
Parameter  |     Group | Mean_Group1 | Mean_Group2 | Difference |     t |    df |    p | CI_low | CI_high |                  Method | Cohens_d
----------------------------------------------------------------------------------------------------------------------------------------------
mtcars$mpg | mtcars$am |       17.15 |       24.39 |       7.24 | -3.77 | 18.33 | 0.00 | -11.28 |   -3.21 | Welch Two Sample t-test |    -1.76

cor.testと同様、結果は1行にまとまっていますが表形式になっているので見やすいです。

`glm`

`table_short`

> glm(Sepal.Length ~ Petal.Length + Species, data=iris) %>% report() %>% table_short()
Waiting for profiling to be done...
Waiting for profiling to be done...
Waiting for profiling to be done...
Parameter         | Coefficient | CI_low | CI_high |    p | Std_Coefficient |  Fit
----------------------------------------------------------------------------------
(Intercept)       |        3.68 |   3.48 |    3.89 | 0.00 |            1.50 |     
Petal.Length      |        0.90 |   0.78 |    1.03 | 0.00 |            1.93 |     
Speciesversicolor |       -1.60 |  -1.98 |   -1.22 | 0.00 |           -1.93 |     
Speciesvirginica  |       -2.12 |  -2.65 |   -1.58 | 0.00 |           -2.56 |     
                  |             |        |         |      |                 |     
R2_Nagelkerke     |             |        |         |      |                 | 0.88

`table_long`

> glm(Sepal.Length ~ Petal.Length + Species, data=iris) %>% report() %>% table_long()
Waiting for profiling to be done...
Waiting for profiling to be done...
Waiting for profiling to be done...
Parameter         | Coefficient |   SE | CI_low | CI_high |     t | df_error |    p | Std_Coefficient |    Fit
--------------------------------------------------------------------------------------------------------------
(Intercept)       |        3.68 | 0.11 |   3.48 |    3.89 | 34.72 |      146 | 0.00 |            1.50 |       
Petal.Length      |        0.90 | 0.06 |   0.78 |    1.03 | 13.96 |      146 | 0.00 |            1.93 |       
Speciesversicolor |       -1.60 | 0.19 |  -1.98 |   -1.22 | -8.28 |      146 | 0.00 |           -1.93 |       
Speciesvirginica  |       -2.12 | 0.27 |  -2.65 |   -1.58 | -7.74 |      146 | 0.00 |           -2.56 |       
                  |             |      |        |         |       |          |      |                 |       
AIC               |             |      |        |         |       |          |      |                 | 106.23
BIC               |             |      |        |         |       |          |      |                 | 121.29
R2_Nagelkerke     |             |      |        |         |       |          |      |                 |   0.88
RMSE              |             |      |        |         |       |          |      |                 |   0.33

summaryだとごちゃごちゃしていて見にくいですがreportを使うと一番知りたい予測子についての情報を一覧で確認できます。
またAICやR2を算出してくれるのは非常に便利です。

`glmer`

`table_short`

> glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial) %>% report() %>% table_short()
Can't calculate log-loss.
Can't calculate proper scoring rules for models without integer response values.
Can't calculate log-loss.
Can't calculate proper scoring rules for models without integer response values.
Can't calculate log-loss.
Can't calculate proper scoring rules for models without integer response values.
Parameter        | Coefficient | CI_low | CI_high |    p | Std_Coefficient |  Fit
---------------------------------------------------------------------------------
(Intercept)      |       -1.40 |  -1.85 |   -0.95 | 0.00 |           -1.40 |     
period2          |       -0.99 |  -1.59 |   -0.40 | 0.00 |           -0.99 |     
period3          |       -1.13 |  -1.76 |   -0.50 | 0.00 |           -1.13 |     
period4          |       -1.58 |  -2.41 |   -0.75 | 0.00 |           -1.58 |     
                 |             |        |         |      |                 |     
R2 (conditional) |             |        |         |      |                 | 0.19
R2 (marginal)    |             |        |         |      |                 | 0.09

`table_long`

> glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial) %>% report() %>% table_long()
Can't calculate log-loss.
Can't calculate proper scoring rules for models without integer response values.
Can't calculate log-loss.
Can't calculate proper scoring rules for models without integer response values.
Can't calculate log-loss.
Can't calculate proper scoring rules for models without integer response values.
Parameter        | Coefficient |   SE | CI_low | CI_high |     z | df_error |    p | Std_Coefficient |    Fit
-------------------------------------------------------------------------------------------------------------
(Intercept)      |       -1.40 | 0.23 |  -1.85 |   -0.95 | -6.05 |       51 | 0.00 |           -1.40 |       
period2          |       -0.99 | 0.30 |  -1.59 |   -0.40 | -3.27 |       51 | 0.00 |           -0.99 |       
period3          |       -1.13 | 0.32 |  -1.76 |   -0.50 | -3.49 |       51 | 0.00 |           -1.13 |       
period4          |       -1.58 | 0.42 |  -2.41 |   -0.75 | -3.74 |       51 | 0.00 |           -1.58 |       
                 |             |      |        |         |       |          |      |                 |       
AIC              |             |      |        |         |       |          |      |                 | 194.05
BIC              |             |      |        |         |       |          |      |                 | 204.18
R2 (conditional) |             |      |        |         |       |          |      |                 |   0.19
R2 (marginal)    |             |      |        |         |       |          |      |                 |   0.09
ICC              |             |      |        |         |       |          |      |                 |   0.11
RMSE             |             |      |        |         |       |          |      |                 |   1.15
LOGLOSS          |             |      |        |         |       |          |      |                 |

エラーが気になるところではありますが、こちらもglmと同様に結果が非常に見やすく出力されています。
加えて、R2をmarginalとconditional両方とも出してくれているので混合モデルを使う際には非常にありがたいです。

CSVで保存

コンソール出力でプレビューする際には罫線などが入って見やすくなっているのですが、report() %>% table_long()の出力の中身自体はデータフレームです。

> glm(Sepal.Length ~ Petal.Length + Species, data=iris) %>% report() %>% table_short() %>% str()
Waiting for profiling to be done...
Waiting for profiling to be done...
Waiting for profiling to be done...
Classes ‘report_table’ and 'data.frame':	6 obs. of  5 variables:
 $ Parameter      : chr  "(Intercept)" "Petal.Length" "Speciesversicolor" "Speciesvirginica" ...
 $ Coefficient    : num  3.684 0.905 -1.601 -2.118 NA ...
 $ p              : num  1.97e-72 1.12e-28 7.37e-14 1.48e-12 NA ...
 $ Std_Coefficient: num  1.5 1.93 -1.93 -2.56 NA ...
 $ Fit            : num  NA NA NA NA NA ...

よって以下のようなコマンドでCSVファイルに出力することが出来ます。

glm(Sepal.Length ~ Petal.Length + Species, data=iris) %>% report() %>% table_short() %>% write.csv("glm_table.csv")

作成されたCSVファイルは冒頭に示したようなものになります。
glmやglmerの結果からこのような表を作るのはけっこう面倒なのですごくありがたいです。

こちらの記事（easystatsについて①: パッケージ群の紹介）によるとreportパッケージが含まれるeasystatsには他にもけっこう便利パッケージがあるみたいです。
日本語の解説が増えると嬉しい笑

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up