統計学における最頻値またはモード(mode)の概念について、日本工業規格は「離散分布の場合は確率関数が,連続分布の場合は密度関数が,最大となる確率変数の値。分布が多峰性の場合は,それぞれの極大値を与える確率変数の値」と定めている様です。
最頻値 - Wikipedia
しばしば「データ群や確率分布で最も頻繁に出現する値」と表現されますが、実際にはデータ処理者が「大きな塊を掴み取りに行く」人間臭い側面も。そう、まさに度数分布(Frequency Table)概念の顕現たる度数分布図(Histogram)の作成過程がそうな訳です。
統計言語Rによる実現例
K <- 100
N <- 1000
pi.est <- c(NULL)
for (k in seq(1,K)) {
x <- runif(N, min=-1, max=1)
y <- runif(N, min=-1, max=1)
Data<-sum(sqrt(x*x+y*y))/N
pi.est<-c(Data,pi.est)
}
hist(pi.est, breaks=50)
rug(pi.est)
実際には最初の元データの中身はこんな感じだったりします。
#再現性確保の為のデータ保存
TestData01<-c(0.7575710,0.7626254,0.7720857,0.7636143,0.7559374,0.7636892 ,0.7697000,0.7744773,0.7718450,0.7661308,0.7643701,0.7773141,0.7601370,0.7625076,0.7657443,0.7642925,0.7652136,0.7626871,0.7651091,0.7838558,0.7577962,0.7679088,0.7691799,0.7778908,0.7651222,0.7745353,0.7630110,0.7704386,0.7693323,0.7626189,0.7589299,0.7469987,0.7606626,0.7728721,0.7608035,0.7766716,0.7748260,0.7754264,0.7671104,0.7588898,0.7763099,0.7590662,0.7713618,0.7706218,0.7752811,0.7783880,0.7727876,0.7694160,0.7689929,0.7661827,0.7569213,0.7565509,0.7594678,0.7557077,0.7661841,0.7831111,0.7671762,0.7670367,0.7456229,0.7633547,0.7806836,0.7537715,0.7705932,0.7695142,0.7594232,0.7629710,0.7571859,0.7698099,0.7585800,0.7565154,0.7619804,0.7723939,0.7621805,0.7769550,0.7552302,0.7477231,0.7605412,0.7620618,0.7601785,0.7479306,0.7770422,0.7799119,0.7709682,0.7657947,0.7612600,0.7643751,0.7653982,0.7593435,0.7823729,0.7659631,0.7529565,0.7603398,0.7592395,0.7607167,0.7566061,0.7573243,0.7640064,0.7661753,0.7667829,0.7758546)
#ヒストグラムとラグプロットの再表示
hist(TestData01, breaks=50)
rug(TestData01)
これをひたすら「整形」していく訳です。
Rで度数分布表を作る
h<-hist(TestData01)
h
$breaks
[1] 0.745 0.750 0.755 0.760 0.765 0.770 0.775 0.780 0.785
$counts
[1] 4 2 19 24 23 13 11 4
$density
[1] 8 4 38 48 46 26 22 8
$mids
[1] 0.7475 0.7525 0.7575 0.7625 0.7675 0.7725 0.7775 0.7825
$xname
[1] "TestData01"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
#breaksが階級を区切る値で、countsが度数。
h <- hist(TestData01, breaks=50)
n <- length(h$counts) # 階級の数
class_names <- NULL # 階級の名前格納用
for(i in 1:n) {
class_names[i] <- paste(h$breaks[i], "~", h$breaks[i+1])
}
frequency_table <- data.frame(class=class_names, frequency=h$counts)
library(xtable)
print(xtable(frequency_table), type = "html")
class | frequency | |
---|---|---|
1 | 0.745 ~ 0.746 | 1 |
2 | 0.746 ~ 0.747 | 1 |
3 | 0.747 ~ 0.748 | 2 |
4 | 0.748 ~ 0.749 | 0 |
5 | 0.749 ~ 0.75 | 0 |
6 | 0.75 ~ 0.751 | 0 |
7 | 0.751 ~ 0.752 | 0 |
8 | 0.752 ~ 0.753 | 1 |
9 | 0.753 ~ 0.754 | 1 |
10 | 0.754 ~ 0.755 | 0 |
11 | 0.755 ~ 0.756 | 3 |
12 | 0.756 ~ 0.757 | 4 |
13 | 0.757 ~ 0.758 | 4 |
14 | 0.758 ~ 0.759 | 3 |
15 | 0.759 ~ 0.76 | 5 |
16 | 0.76 ~ 0.761 | 7 |
17 | 0.761 ~ 0.762 | 2 |
18 | 0.762 ~ 0.763 | 7 |
19 | 0.763 ~ 0.764 | 4 |
20 | 0.764 ~ 0.765 | 4 |
21 | 0.765 ~ 0.766 | 7 |
22 | 0.766 ~ 0.767 | 5 |
23 | 0.767 ~ 0.768 | 4 |
24 | 0.768 ~ 0.769 | 1 |
25 | 0.769 ~ 0.77 | 6 |
26 | 0.77 ~ 0.771 | 4 |
27 | 0.771 ~ 0.772 | 2 |
28 | 0.772 ~ 0.773 | 4 |
29 | 0.773 ~ 0.774 | 0 |
30 | 0.774 ~ 0.775 | 3 |
31 | 0.775 ~ 0.776 | 3 |
32 | 0.776 ~ 0.777 | 3 |
33 | 0.777 ~ 0.778 | 3 |
34 | 0.778 ~ 0.779 | 1 |
35 | 0.779 ~ 0.78 | 1 |
36 | 0.78 ~ 0.781 | 1 |
37 | 0.781 ~ 0.782 | 0 |
38 | 0.782 ~ 0.783 | 1 |
39 | 0.783 ~ 0.784 | 2 |
まぁ何というか…「伝統の職人芸の世界」?