More than 3 years have passed since last update.

Rでvuong testを実施してみた

Posted at 2021-12-06

これまで学術系のデータ分析をPythonで実施していましたが、vuong testというテストを実施することになり、Pythonのstatsmodelsでは対応できなくなりました。

泣く泣くRで分析しましたので、備忘録も兼ねて分析方法を残しておきます。

投稿内容は個人の見解であり、所属する組織の公式見解ではありません。

本記事の目的

個人の備忘録であり特定の読者を想定していませんが、誰かの役に立ったら嬉しいです。
間違っているかもしれないのでご容赦ください。

ちなみにですが、windows10を使用しています。

進め方

1. Pythonで可能なかぎり整形し、csvファイルで吐き出す

Pythonのほうが使い慣れているので、Pythonでcsvファイルを作成しました。

pandasでto_csvを行う際に、index=Falseにし、Rでの整形の手間を減らします。

2.psclのインストール

Windows の R コンソールを使用しパッケージをインストールします。
今回インストールしたいpsclはCRANの中にあるので、次のリンクの方法でインストールできました。

R システムでのパッケージのインストール

また、psclライブラリの詳細はこちらを確認ください。

3.Rでcsvファイルを読み込み

Rでcsvファイルを読み込む際には、fileEncodingを"UTF-8-BOM"にすると、無事に読み込めました。
参考リンク：Rでデータ読み込みから前処理までのTips

data <- read.csv("hoge.csv", header = T, fileEncoding="UTF-8-BOM")

4.回帰結果の格納

負の二項回帰モデルとゼロ過剰負の二項回帰モデルの結果を、vuong testに備えて格納します。

ゼロ過剰負の二項回帰モデルに関しては、UCLAのtutorialがわかりやすいです。
ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION | R DATA ANALYSIS EXAMPLES

実際のコードはこちら。

# categoricalな説明変数を、factor型に変換する
# (実際は、必要ないかもしれない)
data <- within(data, {
  hoge1 <- factor(hoge1)
  hoge2 <- factor(hoge2)
})

# 回帰結果を格納
nb1 <- MASS::glm.nb(hoge ~ ., data=data)
zinb <- zeroinfl(hoge ~ . | ., data = data, dist = "negbin", EM = TRUE)

引用元：vuong: Vuong's non-nested hypothesis test

すごく初歩的な話ですが、"y ~ ."と書くと、

被説明変数 ... y
説明変数 ... y以外のすべて
と、なります。
参考リンク：71. 回帰分析と重回帰分析

引用元のzeroinflでは"EM=TRUE"が入っていましたが、エラーが出たので外しました。
(状況次第では、外すと回帰結果に影響が出るかもしれません。)

今回は使用してないですが、mpathというライブラリからもvuong testができるみたいです。
参考リンク：vuong.test: Vuong's non-nested hypothesis test

5.vuong testの実施

vuong testを実施します。

vuong(nb1, zinb)

引用元：vuong: Vuong's non-nested hypothesis test

結果は、こんな感じに出力されます。

              Vuong z-statistic             H_A    p-value
Raw                  -0.3378267 model2 > model1    0.36775
AIC-corrected         4.5566296 model1 > model2  2.599e-06
BIC-corrected        19.4932729 model1 > model2 < 2.22e-16

結果の解釈は以下の通り。

The Vuong non-nested test is based on a comparison of the predicted probabilities of two models that do not nest. Examples include comparisons of zero-inflated count models with their non-zero-inflated analogs (e.g., zero-inflated Poisson versus ordinary Poisson, or zero-inflated negative-binomial versus ordinary negative-binomial). A large, positive test statistic provides evidence of the superiority of model 1 over model 2, while a large, negative test statistic is evidence of the superiority of model 2 over model 1. Under the null that the models are indistinguishable, the test statistic is asymptotically distributed standard normal.

その他

vuong testはnon nestedに適用され、通常モデルとゼロ過剰モデルとの比較で用いられますが、
通常モデルとゼロ過剰モデルがnon nestedかどうかは、詳しい人の中でも意見が分かれるみたいです。
(個人的に何個かの言説を見た限りでは、現在ではnon nestedではないという考え方が主流です。)

Are a zero-truncated Poisson and basic Poisson nested or non-nested?

Why Vuong test was removed? - Statalist

The Misuse of The Vuong Test For
Non-Nested Models to Test for Zero-Inflation

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up