RAdvent Calendar 2024

Rを学びたい Step18 不偏分散

Posted at 2024-12-15

はじめに

Rを学びたいのStep18です。今回は不偏分散について学びます。

不偏分散とは何か？

サンプルデータのばらつきを表す指標である。全体のばらつきがわからない場合、サンプルデータの分散を流用して利用します。

\begin{align}
s^2 &= \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \\
& s^2 \text{ :不偏分散}, \\
& n \text{ : データの個数}, \\
& x_i \text{:各データの値}, \\
& \bar{x} \text{:サンプル平均}.

\end{align}

不偏分散を求める問題

あるクラスの5人のテストの点数は次の通りです。

生徒	点数 ( x )
A	60
B	70
C	80
D	90
E	100

解答

サンプル平均 \bar{x} を計算する
平均は次の公式で計算します：

\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i

\bar{x} = \frac{60 + 70 + 80 + 90 + 100}{5} = 80

偏差の２乗を算出

生徒	点数 ( x )	偏差 ( x_i - \bar{x} )	偏差の二乗 ( (x_i - \bar{x})^2 )
A	60	( 60 - 80 = -20 )	( (-20)^2 = 400 )
B	70	( 70 - 80 = -10 )	( (-10)^2 = 100 )
C	80	( 80 - 80 = 0 )	( 0^2 = 0 )
D	90	( 90 - 80 = 10 )	( 10^2 = 100 )
E	100	( 100 - 80 = 20 )	( 20^2 = 400 )

偏差の２乗の総和を算出

\sum_{i=1}^n (x_i - \bar{x})^2 = 400 + 100 + 0 + 100 + 400 = 1000

不偏分散の公式に代入

s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2

s^2 = \frac{1}{5-1} \cdot 1000 = \frac{1}{4} \cdot 1000 = 250

よって、不偏分散は250である。

Rで実装する

# 点数データをベクトルとして定義
scores <- c(60, 70, 80, 90, 100)

# サンプル平均を計算
mean_score <- mean(scores)

# 偏差を計算
deviations <- scores - mean_score

# 偏差の二乗を計算
squared_deviations <- deviations^2

# 偏差の二乗の合計
sum_squared_deviations <- sum(squared_deviations)

# 不偏分散を計算
n <- length(scores)  # サンプルサイズ
unbiased_variance <- sum_squared_deviations / (n - 1)

# 結果を表示
cat("サンプル平均:", mean_score, "\n")
cat("偏差の二乗の合計:", sum_squared_deviations, "\n")
cat("不偏分散:", unbiased_variance, "\n")

実行結果

~/develop/R/r_study/unbiased_deviations  (main)$ Rscript test.R               (base) 
サンプル平均: 80 
偏差の二乗の合計: 1000 
不偏分散: 250

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up