More than 1 year has passed since last update.

Rで因子分析するメモ

Posted at 2023-09-22

１年に１回くらいしかRを使わないので、毎回基本的なことからやり直してる自分のための覚書メモ。

まずはデータの読み込み。今回はcsvファイルを読み込む。

x <- read.csv("test.csv", header=T, fileEncoding = "UTF-8-BOM")

fileEncoding = "UTF-8-BOM") は Windows で読み込みエラーが出るときにつけるやつ。

因子分析

１．分析に必要な列だけ取り出す（今回はQ1～Q39の39項目を使った因子分析）。1列目はID、2列目にグループ情報が入っているので、分析対象の3列目以降を取り出す。

df <- x[3:41]

２．固有値を求めて、スクリープロットを描く

cor <- cor(df)
eigen <- eigen(cor)$values
plot(eigen, type="b")

３．固有値が１以上の個数＝因子数　として、因子分析をする。（因子数の決め方はいろいろあるけど、今回は機械的に決めちゃう）

n_fact <- sum(eigen >=1)

#因子分析
factanal <- factanal(x=df, factors=n_fact, rotation="promax")
print(factanal, cutoff=0)

factanal は最尤法で計算している。
"promax" プロマックス回転を指定。cutoff=0 因子負荷量をすべて出力する。

出力はこんなかんじ。６因子で計算。

Call:
factanal(x = df, factors = n_fact, rotation = "promax")

Uniquenesses:
 Q001  Q002  Q003  Q004  Q005  Q006  Q007  Q008  Q009  Q010 ・・・
0.424 0.273 0.235 0.466 0.426 0.478 0.564 0.627 0.573 0.615 ・・・
Loadings:
     Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
Q001  0.701  -0.035  -0.097  -0.044   0.048   0.217 
Q002  0.203   0.041   0.010  -0.004  -0.099   0.763 
Q003  0.115  -0.003   0.072   0.057   0.027   0.746 
Q004  0.000  -0.102   0.176   0.214   0.509   0.016 
Q005  0.235  -0.026   0.076   0.409   0.180  -0.043 
Q006  0.253  -0.074   0.133   0.059   0.419   0.018 
Q007  0.475   0.204   0.199  -0.068  -0.072  -0.061 
Q008  0.525   0.159   0.108  -0.047  -0.124   0.014 
Q009  0.434   0.262   0.122  -0.023  -0.067  -0.037 
Q010  0.005   0.405  -0.080  -0.150   0.387   0.111 
・・・中略・・・

               Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
SS loadings      4.717   3.735   2.615   2.598   2.280   1.339
Proportion Var   0.121   0.096   0.067   0.067   0.058   0.034
Cumulative Var   0.121   0.217   0.284   0.350   0.409   0.443

Factor Correlations:
        Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
Factor1   1.000  -0.514   0.751   0.672  -0.692  -0.689
Factor2  -0.514   1.000  -0.351  -0.180   0.384   0.170
Factor3   0.751  -0.351   1.000   0.607  -0.592  -0.590
Factor4   0.672  -0.180   0.607   1.000  -0.641  -0.567
Factor5  -0.692   0.384  -0.592  -0.641   1.000   0.620
Factor6  -0.689   0.170  -0.590  -0.567   0.620   1.000

Test of the hypothesis that 6 factors are sufficient.
The chi square statistic is 1976.27 on 522 degrees of freedom.
The p-value is 1.15e-167

Uniquenesses 独自性。共通性は 1-Unisuenesses 共通性が大きい＝因子によって説明できる部分が大きい。

Loadings 因子負荷量。絶対値が最も大きい因子への影響度が大きい。
SS Loadings 因子負荷量の平方和
Proportion Var 寄与率
Cumulative Var 累積寄与率
Factor Correlations 因子間の相関行列（プロマックス回転のときだけ出力される）
p-value モデルの適合の検定。 p値が大きいほど適合している 。帰無仮説「モデルがあてはまっている」に対して検定をしているため、p<0.05 のときに仮説が棄却される＝「あてはまっているとはいえない」⇒ 今回の結果は「あてはまっているとはいえない」。残念。

ほかに使ったコマンド

グループごとにサブセットを取り出し、 for文 で回して分析する。

for ( i in 1:5 ){
	cat("### Group = ", i, "\n")
	df <- subset(x, Group==i)
    ・・・
}

cat 文字列をつなげて表示。
subset 要素Groupに入っているデータを見てグループを抽出する。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up