More than 3 years have passed since last update.

VBAユーザがPython・Rを使ってみた：確率分布

Last updated at 2021-02-11Posted at 2021-02-11

はじめに

機械学習の勉強を始めたVBAユーザです。
備忘録としてPython・Rの文法をVBAと比較しながらまとめています。

今回は、確率分布についてまとめます。
乱数の発生については、別記事にしました。

確率分布
- 連続確率分布
- 離散確率分布
まとめ
- 一覧
- プログラム全体

確率分布

今回は、確率分布（probability distribution）についてまとめます。

連続確率分布

まず、連続確率分布（continuous probability distribution）から説明します。

確率密度関数

連続確率分布の確率密度関数（probability density function; PDF）についてです。
連続確率分布の例として（1次元）正規分布（normal distribution）（ガウス分布（Gaussian distribution））の場合で説明します。

確率変数 $X$ が平均 $\mu$, 分散 $\sigma^2 >0 $ の（1次元）正規分布に従う $X \sim N(\mu, \sigma^2)$ とします。$X$ の確率密度関数 $f_{X}(x)$ は次の形（ガウス関数）です。

f_{X}(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{-(x-\mu)^2}{2\sigma^2}} = {\frac{1}{\sqrt{2\pi\sigma^2}}} \exp \!\left( - {\frac{(x-\mu)^2}{2\sigma^2}} \right) \quad (x \in \mathbb{R})

Python
Pythonでは、科学技術計算ライブラリの SciPy の統計関係のサブパッケージ scipy.stats を使います。scipy.stats から使用する関数をインポートしておきます。
計算に使うので、Numpy もインポートしておきます。また、ヒストグラムを描画するために Matplotlib の matplotlib.pyplot もインポートしておきます。

Python3

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

m = 50
s = 10
x = np.linspace(0, 100, num=100+1)
print(x)
# [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
#   14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
#   ...
#   84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
#   98.  99. 100.]
y = norm.pdf(x, m, s)
print(y)
# [1.48671951e-07 2.43896075e-07 3.96129909e-07 6.36982518e-07
#  1.01408521e-06 1.59837411e-06 2.49424713e-06 3.85351967e-06
#   ...
#  1.01408521e-06 6.36982518e-07 3.96129909e-07 2.43896075e-07
#  1.48671951e-07]
plt.figure()
plt.plot(x, y)
plt.show()

m <- 50
s <- 10
x <- 0:100
x
#   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
#  [21]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39
#  ...
#  [81]  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99
# [101] 100
y <- dnorm(x, m, s)
y
#  [1] 1.486720e-07 2.438961e-07 3.961299e-07 6.369825e-07 1.014085e-06 1.598374e-06
#  [7] 2.494247e-06 3.853520e-06 5.894307e-06 8.926166e-06 1.338302e-05 1.986555e-05
#  ...
# [97] 1.014085e-06 6.369825e-07 3.961299e-07 2.438961e-07 1.486720e-07
plot(x, y, type="l")

注意）Rでグラフを描画するには、標準パッケージでもできますが、ggplot2を使った方が「一貫性のある文法で合理的に」描けます。（こちらの記事参照。）

VBA

VBA

累積分布関数と分位関数

連続分布の累積分布関数（cumulative distribution function; CDF）（分布関数（distribution function））$F_{X}(x)$、$1-F_{X}(x)$ である相補累積分布関数（complementary cumulative distribution function; CCDF)（上側確率（upper-tail probability）, 生存関数（survival function））、累積分布関数の逆関数 ${F_{X}}^{-1}$ である分位関数（quantile function）（分位点関数）です。

\begin{eqnarray*}
F_{X}(x) &=& \mathrm{P}(X \le x) = \int_{-\infty}^{x} f_{X}(t)\,dt \quad (x \in \mathbb{R}) \\
\bar{F_X}(x) &=& \mathrm{P}(X > x) = 1-F_{X}(x) = \int_{x}^{\infty} f_{X}(t)\\
G_{X}(p) &=& {F_{X}}^{-1}(p) \quad (p \in [0,1])
\end{eqnarray*}

Python

Python3

m = 0
s = 1
x = np.linspace(-3.5, 3.5, num=100+1)
y = norm.pdf(x, m, s)   # Probability density function
z = norm.cdf(x, m, s)   # Cumulative distribution function
p = np.arange(0, 1+0.1, 0.1)
print(p)
# [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
q = norm.ppf(p, m, s)   # Percent point function (percentiles) = inverse of cdf
print(q)
# [       -inf -1.28155157 -0.84162123 -0.52440051 -0.2533471   0.
#   0.2533471   0.52440051  0.84162123  1.28155157         inf]
plt.figure()
plt.plot(x, y, label='pdf')
plt.plot(x, z, label='cdf')
plt.plot(q, p, marker='o', linestyle='', color='red', label='ppf') 
for i in range(len(p)):
    plt.axhline(y=p[i], xmin=0.05, xmax=0.95, linewidth=0.5, color='red', alpha=0.6)
plt.vlines(x=q, ymin=0, ymax=p, linewidth=0.5, color='red', alpha=0.6)
plt.legend(loc='upper left')
# plt.grid()
plt.show()

m <- 0
s <- 1
x <- seq(-3.5, 3.5, length=100+1)
x
y <- dnorm(x, m, s)
z <- pnorm(x, m, s)
plot(x, y, xlim=c(-3.5, 3.5), ylim=c(0,1), type="l", col=1)
par(new=T)
plot(x, z, xlim=c(-3.5, 3.5), ylim=c(0,1), type="l", col=2)

p <- seq(0, 1, by=0.1)
p
# [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
q <- qnorm(p, m, s)
q
# [1]       -Inf -1.2815516 -0.8416212 -0.5244005 -0.2533471  0.0000000
# [7]  0.2533471  0.5244005  0.8416212  1.2815516        Inf

VBA

VBA

モーメント等

確率分布のモーメント（moment）（積率）についてです。
確率密度関数 $f(x)$ のモーメント

$\mu =\mu_{1}^{(0)}$ は $x$ の平均値。
$\sigma^2 = \mu_{2} = \mu_{2}^{(0)} - (\mu_{1}^{(0)} )^2 $ は分散、$\sigma ={\sqrt {\mu_2}}$ は標準偏差。
$\gamma _{1}=\mu _{3}/\sigma ^{3}$は歪度。
$\gamma _{2}=\mu _{4}/\sigma ^{4}-3$ は尖度。

Python

Python3

m = 0
s = 1

print(norm.mean(m, s), norm.var(m, s), norm.std(m, s), norm.median(m, s))
# 0.0 1.0 1.0 0.0
mean, var, skew, kurt = norm.stats(m, s, moments='mvsk')
print(mean, var, skew, kurt)
for n in range(1, 5+1, 1):
    moment = norm.moment(n, m, s)
    print('Moment({}) = {}'.format(n, moment))
# Moment(1) = 0.0
# Moment(2) = 1.0
# Moment(3) = 0.0
# Moment(4) = 3.0
# Moment(5) = -1.5022529663236356e-14

x = np.linspace(-3.5, 3.5, num=100+1)
y = norm.pdf(x, m, s)
mm = norm.mean(m, s)
ss = norm.std(m, s)
plt.figure()
plt.plot(x, y, label='pdf')
plt.axvline(x= mm, linestyle="-", color='red')
plt.axvline(x= ss, linestyle=':', color='red')
plt.axvline(x=-ss, linestyle=':', color='red')
plt.xlim(-3.5, 3.5)
plt.ylim(0, 0.5)
plt.show()

VBA

VBA

連続確率分布の例

Python

Python3

from scipy.stats import norm, t, chi2, f

plt.figure(figsize=(12,8))

# 正規分布 N(m,s)
x = np.linspace(-5, 5, num=100)
mlist = [0, 0, 0, 1, 2]
slist = [1, 2, 3, 1, 1]
plt.subplot(2, 2, 1)
for i in range(len(mlist)):
    m = mlist[i]
    s = slist[i]
    y = norm.pdf(x, m, s)
    plt.plot(x, y, label='N({}, {})'.format(m, s))
plt.legend()

# t分布 t(n)
x = np.linspace(-5, 5, num=100)
plt.subplot(2, 2, 2)
for n in range(1,5+1):
    y = t.pdf(x, n)
    plt.plot(x, y, label='t({})'.format(n))
plt.legend()

# カイ2乗分布 χ^2(n)
x = np.linspace(0, 10, num=100)
nlist = [1, 2, 3, 5, 10]
plt.subplot(2, 2, 3)
for n in nlist:
    y = chi2.pdf(x, n)
    plt.plot(x, y, label='Chi2({})'.format(n))
plt.legend()

# F分布 F(n,m)
x = np.linspace(0, 3, num=100)
plt.subplot(2, 2, 4)
nlist = [1, 1, 2, 5, 10, 10, 50,  50]
mlist = [1, 5, 5, 5, 10, 50, 50, 100]
for i in range(len(mlist)):
    n = nlist[i]
    m = mlist[i]
    y = f.pdf(x, n, m)
    plt.plot(x, y, label='F({}, {})'.format(n,m))
plt.legend()

plt.show()

# 正規分布 N(m,s)
x <- seq(-5, 5, length=100+1)
mlist <- c(0, 0, 0, 1, 2)
slist <- c(1, 2, 3, 1, 1)
for (i in 1:5) {
  m <- mlist[i]
  s <- slist[i]
  y <- dnorm(x, m, s)
  plot(x, y, xlim=c(-5,5), ylim=c(0, 0.4), type="l", col=i)
  par(new=T)
}
par(new=F)

# t分布 t(n)
x <- seq(-5, 5, length=100+1)
nlist <- 1:5
for (i in 1:length(nlist)) {
  n <- nlist[i]
  y <- dt(x, df=n)
  plot(x, y, xlim=c(-5,5), ylim=c(0,0.4), type="l", col=i)
  par(new=T)
}
par(new=F)

# カイ2乗分布 χ^2(n)
x <- seq(0, 10, length=100+1)
nlist <- c(1, 2, 3, 5, 10)
for (i in 1:length(nlist)) {
  n <- nlist[i]
  y <- dchisq(x, df=n)
  plot(x, y, xlim=c(0,10), ylim=c(0,1.2), type="l", col=i)
  par(new=T)
}
par(new=F)

# F分布 F(m,n)
x <- seq(0, 3, length=100+1)
nlist <- c(1, 1, 2, 5, 10, 10, 50,  50)
mlist <- c(1, 5, 5, 5, 10, 50, 50, 100)
for (i in 1:length(nlist)) {
  n <- nlist[i]
  m <- mlist[i]
  y <- df(x, df1=n, df2=m)
  plot(x, y, xlim=c(0,3), ylim=c(0,2), type="l", col=i)
  par(new=T)
}
par(new=F)

VBA

VBA

離散確率分布

次に、離散確率分布（discrete probability distribution）についてです。

確率質量関数

連続確率分布の確率密度関数（probability density function; PDF）にあたる離散確率分布の確率質量関数（probability mass function; PMF）です。

Python

Python3

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

n = 10
p = 0.5
x = np.arange(0, 10+1, 1)
print(x)
# [ 0  1  2  3  4  5  6  7  8  9 10]
y = binom.pmf(x, n, p)
print(y)
# [0.00097656 0.00976563 0.04394531 0.1171875  0.20507813 0.24609375
#  0.20507813 0.1171875  0.04394531 0.00976563 0.00097656]
plt.figure()
plt.plot(x, y, marker='o')
plt.show()

plt.figure()
plt.plot(x, y, linestyle='', marker='o', markersize=8)
plt.vlines(x, ymin=0, ymax=y, linestyles='-', linewidth=5, color='#1f77b4', alpha=0.5)
plt.show()

# pmf
n <- 10
p <- 0.5
x <- 0:10
x
# [1]  0  1  2  3  4  5  6  7  8  9 10
y <- dbinom(x, size=n, prob=p)
y
# [1] 0.0009765625 0.0097656250 0.0439453125 0.1171875000 0.2050781250 0.2460937500
# [7] 0.2050781250 0.1171875000 0.0439453125 0.0097656250 0.0009765625
plot(x, y)
plot(x, y, type="b")
plot(x, y, type="o")
plot(x, y, type="h")

plot(x, y)
par(new=T)
plot(x, y, type="h")

VBA

VBA

離散確率分布の累積分布関数と分位関数

離散確率分布の累積分布関数です。

離散確率分布の例として二項分布（binomial distribution） $B(n,p)$ の場合で説明します。

\mathrm{P}(X = k) = {n \choose k}p^{k}(1-p)^{n-k} \quad (k=0,1,2, \dots ,n) \\
\\
{}_{n}\!\mathrm{C} _{k} = {n \choose k} = {\frac{n!}{k!\,(n-k)!}}

Python

Python3

n = 10
p = 0.5
x = np.arange(0, 10+1, 1)
y = binom.pmf(x, n, p)   # Probability mass function
z = binom.cdf(x, n, p)   # Cumulative distribution function
pp = np.arange(0.1, 1+0.1, 0.1)
print(pp)
# [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
q = binom.ppf(pp, n, p)  # Percent point function (percentiles) = inverse of cdf
print(q)
# [ 3.  4.  4.  5.  5.  5.  6.  6.  7. 10.]
plt.figure()
plt.plot(x, y, marker='o', label='pmf')
# plt.plot(x, z, marker='.', label='cdf')
plt.step(x, z, marker='.', where='post', label='cdf')
plt.plot(q, pp, marker='o', linestyle='', color='red', label='ppf') 
for i in range(len(pp)):
    plt.axhline(y=pp[i], xmin=0.05, xmax=0.95, linewidth=0.5, color='red', alpha=0.6)
plt.vlines(x=q, ymin=0, ymax=pp, linewidth=0.5, color='red', alpha=0.6)
plt.legend(loc='upper left')
# plt.grid()
plt.show()

# cdf
n <- 10
p <- 0.5
x <- 0:10
y <- dbinom(x, size=n, prob=p)
z <- pbinom(x, size=n, prob=p)
plot(x, y, xlim=c(0,10), ylim=c(0,1), type="o", col="blue")
par(new=T)
plot(x, z, xlim=c(0,10), ylim=c(0,1), type="o", col="red")

# ppf
pp = seq(0.1, 1, by=0.1)
pp
# [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
q <- qbinom(pp, size=n, prob=p)
q
# [1]  3  4  4  5  5  5  6  6  7 10

VBA

VBA

離散確率分布のモーメント等

Python

Python3

n = 10
p = 0.5
mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
print(mean, var, skew, kurt)
# 5.0 2.5 0.0 -0.2

VBA

VBA

離散確率分布の例

Python

Python3

from scipy.stats import binom, poisson

plt.figure(figsize=(12,4))

# 二項分布 B(n,p)
x = np.arange(0, 10+1, 1)
n = 10
plt.subplot(1, 2, 1)
for i in range(10-1):
    p = 0.0 + (i+1)/10
    y = binom.pmf(x, n, p)
    plt.plot(x, y, label='B({}, {})'.format(n,p), marker='o')
plt.legend()

# ポワソン分布 Po(L)
x = np.arange(0, 20+1, 1)
Llist = [1, 2, 5, 10, 15]
plt.subplot(1, 2, 2)
for L in Llist:
    y = poisson.pmf(x, L)
    plt.plot(x, y, label='Po({})'.format(L), marker='.')
plt.legend()

plt.show()

# 二項分布 B(n,p)
x <- seq(0, 10, 1)
n <- 10
for (i in 1:9) {
  p <- 0.0 + i/10
  print(p)
  y <- dbinom(x, size=n, prob=p)
  plot(x, y, xlim=c(0,10), ylim=c(0,0.4), type="o", col=i)
  par(new=T)
}
par(new=F)

# ポワソン分布 Po(L)
x <- seq(0, 20, 1)
Llist <- c(1, 2, 5, 10, 15)
for (i in 1:length(Llist)) {
  L <- Llist[i]
  print(L)
  y <- dpois(x, lambda=L)
  plot(x, y, xlim=c(0,20), ylim=c(0,0.4), type="o", col=i)
  par(new=T)
}
par(new=F)

VBA

VBA

まとめ

一覧

各言語で使用する関数等を一覧にまとめます。比較のために、EXCELでの計算も示しました。

確率分布

$X$ ：確率変数
$f_{X}(x)$ ：確率密度関数（probability density function; PDF）
$F_{X}(x)$ ：累積分布関数（cumulative distribution function; CDF）
$\bar{F_X}(x)$ ：相補累積分布関数（complementary cumulative distribution function; CCDF)（上側確率（upper-tail probability）, 生存関数（survival function））
$G_{X}(p)$ ：分位関数（quantile function）（分位点関数）

\begin{eqnarray*}
F_{X}(x) &=& \mathrm{P}(X \le x) = \int_{-\infty}^{x} f_{X}(t)\,dt \quad (x \in \mathbb{R}) \\
\bar{F_X}(x) &=& \mathrm{P}(X > x) = 1-F_{X}(x) = \int_{x}^{\infty} f_{X}(t)\\
G_{X}(p) &=& {F_{X}}^{-1}(p) \quad (p \in [0,1])
\end{eqnarray*}

	Python	R
確率密度関数（pdf）$f(x)$	`xxx.pdf(x, ...)`	`dxxx(x, ...)`
確率分布関数（cdf）$F(x)$	`xxx.cdf(x, ...)`	`pxxx(x, ...)`
分位関数（ppf）$G(p)$	`xxx.ppf(p, ...)`	`qxxx(p, ...)`
乱数（random numbers）	`xxx.rvs(..., size=N)`	`rxxx(n=N, ...)`
注意）`xxx`部分は各分布の名称、`...`は各分布のパラメータ。例えば、正規分布$N(m,s^2)$なら、Pythonでは`norm.pdf(x, m, s)`, `norm.cdf(x, m, s)`, `norm.ppf(p, m, s)`, `norm.rvs(m, s, size=N)`、Rでは`dnorm(x, m, s)`, `pnorm(x, m, s)`, `qnorm(p, m, s)`, `rnorm(n=N, mean=m, sd=s)` など。
注意）乱数については、こちら参照。
注意）Pythonでは、`scipy.stats`から、必要な関数をインポートしたうえで使用。
`from scipy.stats import uniform, norm, chi2, t, f, beta, gamma, lognorm, expon`、`from scipy.stats import bernoulli, binom, poisson, nbinom, geom, hypergeom, randint`

プログラム全体

参考までに使ったプログラムの全体を示します。

Python

Python3

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

m = 50
s = 10
x = np.linspace(0, 100, num=100+1)
print(x)
# [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
#   14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
#   ...
#   84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
#   98.  99. 100.]
y = norm.pdf(x, m, s)
print(y)
# [1.48671951e-07 2.43896075e-07 3.96129909e-07 6.36982518e-07
#  1.01408521e-06 1.59837411e-06 2.49424713e-06 3.85351967e-06
#   ...
#  1.01408521e-06 6.36982518e-07 3.96129909e-07 2.43896075e-07
#  1.48671951e-07]
plt.figure()
plt.plot(x, y)
plt.show()


m = 0
s = 1
x = np.linspace(-3.5, 3.5, num=100+1)
y = norm.pdf(x, m, s)   # Probability density function
z = norm.cdf(x, m, s)   # Cumulative distribution function
p = np.arange(0, 1+0.1, 0.1)
print(p)
# [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
q = norm.ppf(p, m, s)   # Percent point function (percentiles) = inverse of cdf
print(q)
# [       -inf -1.28155157 -0.84162123 -0.52440051 -0.2533471   0.
#   0.2533471   0.52440051  0.84162123  1.28155157         inf]
plt.figure()
plt.plot(x, y, label='pdf')
plt.plot(x, z, label='cdf')
plt.plot(q, p, marker='o', linestyle='', color='red', label='ppf') 
for i in range(len(p)):
    plt.axhline(y=p[i], xmin=0.05, xmax=0.95, linewidth=0.5, color='red', alpha=0.6)
plt.vlines(x=q, ymin=0, ymax=p, linewidth=0.5, color='red', alpha=0.6)
plt.legend(loc='upper left')
# plt.grid()
plt.show()


m = 0
s = 1

print(norm.mean(m, s), norm.var(m, s), norm.std(m, s), norm.median(m, s))
# 0.0 1.0 1.0 0.0
mean, var, skew, kurt = norm.stats(m, s, moments='mvsk')
print(mean, var, skew, kurt)
for n in range(1, 5+1, 1):
    moment = norm.moment(n, m, s)
    print('Moment({}) = {}'.format(n, moment))
# Moment(1) = 0.0
# Moment(2) = 1.0
# Moment(3) = 0.0
# Moment(4) = 3.0
# Moment(5) = -1.5022529663236356e-14

x = np.linspace(-3.5, 3.5, num=100+1)
y = norm.pdf(x, m, s)
mm = norm.mean(m, s)
ss = norm.std(m, s)
plt.figure()
plt.plot(x, y, label='pdf')
plt.axvline(x= mm, linestyle="-", color='red')
plt.axvline(x= ss, linestyle=':', color='red')
plt.axvline(x=-ss, linestyle=':', color='red')
plt.xlim(-3.5, 3.5)
plt.ylim(0, 0.5)
plt.show()


from scipy.stats import norm, t, chi2, f

plt.figure(figsize=(12,8))

# 正規分布 N(m,s)
x = np.linspace(-5, 5, num=100)
mlist = [0, 0, 0, 1, 2]
slist = [1, 2, 3, 1, 1]
plt.subplot(2, 2, 1)
for i in range(len(mlist)):
    m = mlist[i]
    s = slist[i]
    y = norm.pdf(x, m, s)
    plt.plot(x, y, label='N({}, {})'.format(m, s))
plt.legend()

# t分布 t(n)
x = np.linspace(-5, 5, num=100)
plt.subplot(2, 2, 2)
for n in range(1,5+1):
    y = t.pdf(x, n)
    plt.plot(x, y, label='t({})'.format(n))
plt.legend()

# カイ2乗分布 χ^2(n)
x = np.linspace(0, 10, num=100)
nlist = [1, 2, 3, 5, 10]
plt.subplot(2, 2, 3)
for n in nlist:
    y = chi2.pdf(x, n)
    plt.plot(x, y, label='Chi2({})'.format(n))
plt.legend()

# F分布 F(n,m)
x = np.linspace(0, 3, num=100)
plt.subplot(2, 2, 4)
nlist = [1, 1, 2, 5, 10, 10, 50,  50]
mlist = [1, 5, 5, 5, 10, 50, 50, 100]
for i in range(len(mlist)):
    n = nlist[i]
    m = mlist[i]
    y = f.pdf(x, n, m)
    plt.plot(x, y, label='F({}, {})'.format(n,m))
plt.legend()

plt.show()


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

n = 10
p = 0.5
x = np.arange(0, 10+1, 1)
print(x)
# [ 0  1  2  3  4  5  6  7  8  9 10]
y = binom.pmf(x, n, p)
print(y)
# [0.00097656 0.00976563 0.04394531 0.1171875  0.20507813 0.24609375
#  0.20507813 0.1171875  0.04394531 0.00976563 0.00097656]
plt.figure()
plt.plot(x, y, marker='o')
plt.show()

plt.figure()
plt.plot(x, y, linestyle='', marker='o', markersize=8)
plt.vlines(x, ymin=0, ymax=y, linestyles='-', linewidth=5, color='#1f77b4', alpha=0.5)
plt.show()


n = 10
p = 0.5
x = np.arange(0, 10+1, 1)
y = binom.pmf(x, n, p)   # Probability mass function
z = binom.cdf(x, n, p)   # Cumulative distribution function
pp = np.arange(0.1, 1+0.1, 0.1)
print(pp)
# [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
q = binom.ppf(pp, n, p)  # Percent point function (percentiles) = inverse of cdf
print(q)
# [ 3.  4.  4.  5.  5.  5.  6.  6.  7. 10.]
plt.figure()
plt.plot(x, y, marker='o', label='pmf')
# plt.plot(x, z, marker='.', label='cdf')
plt.step(x, z, marker='.', where='post', label='cdf')
plt.plot(q, pp, marker='o', linestyle='', color='red', label='ppf') 
for i in range(len(pp)):
    plt.axhline(y=pp[i], xmin=0.05, xmax=0.95, linewidth=0.5, color='red', alpha=0.6)
plt.vlines(x=q, ymin=0, ymax=pp, linewidth=0.5, color='red', alpha=0.6)
plt.legend(loc='upper left')
# plt.grid()
plt.show()


n = 10
p = 0.5
mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
print(mean, var, skew, kurt)
# 5.0 2.5 0.0 -0.2


from scipy.stats import binom, poisson

plt.figure(figsize=(12,4))

# 二項分布 B(n,p)
x = np.arange(0, 10+1, 1)
n = 10
plt.subplot(1, 2, 1)
for i in range(10-1):
    p = 0.0 + (i+1)/10
    y = binom.pmf(x, n, p)
    plt.plot(x, y, label='B({}, {})'.format(n,p), marker='o')
plt.legend()

# ポワソン分布 Po(L)
x = np.arange(0, 20+1, 1)
Llist = [1, 2, 5, 10, 15]
plt.subplot(1, 2, 2)
for L in Llist:
    y = poisson.pmf(x, L)
    plt.plot(x, y, label='Po({})'.format(L), marker='.')
plt.legend()

plt.show()

# pdf
m <- 50
s <- 10
x <- 0:100
x
#   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
#  [21]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39
#  ...
#  [81]  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99
# [101] 100
y <- dnorm(x, m, s)
y
#  [1] 1.486720e-07 2.438961e-07 3.961299e-07 6.369825e-07 1.014085e-06 1.598374e-06
#  [7] 2.494247e-06 3.853520e-06 5.894307e-06 8.926166e-06 1.338302e-05 1.986555e-05
#  ...
# [97] 1.014085e-06 6.369825e-07 3.961299e-07 2.438961e-07 1.486720e-07
plot(x, y, type="l")


# cdf
m <- 0
s <- 1
x <- seq(-3.5, 3.5, length=100+1)
x
y <- dnorm(x, m, s)
z <- pnorm(x, m, s)
plot(x, y, xlim=c(-3.5, 3.5), ylim=c(0,1), type="l", col=1)
par(new=T)
plot(x, z, xlim=c(-3.5, 3.5), ylim=c(0,1), type="l", col=2)

# ppf
p <- seq(0, 1, by=0.1)
p
# [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
q <- qnorm(p, m, s)
q
# [1]       -Inf -1.2815516 -0.8416212 -0.5244005 -0.2533471  0.0000000
# [7]  0.2533471  0.5244005  0.8416212  1.2815516        Inf


# 正規分布 N(m,s)
x <- seq(-5, 5, length=100+1)
mlist <- c(0, 0, 0, 1, 2)
slist <- c(1, 2, 3, 1, 1)
for (i in 1:5) {
  m <- mlist[i]
  s <- slist[i]
  y <- dnorm(x, m, s)
  plot(x, y, xlim=c(-5,5), ylim=c(0, 0.4), type="l", col=i)
  par(new=T)
}
par(new=F)

# t分布 t(n)
x <- seq(-5, 5, length=100+1)
nlist <- 1:5
for (i in 1:length(nlist)) {
  n <- nlist[i]
  y <- dt(x, df=n)
  plot(x, y, xlim=c(-5,5), ylim=c(0,0.4), type="l", col=i)
  par(new=T)
}
par(new=F)

# カイ2乗分布 χ^2(n)
x <- seq(0, 10, length=100+1)
nlist <- c(1, 2, 3, 5, 10)
for (i in 1:length(nlist)) {
  n <- nlist[i]
  y <- dchisq(x, df=n)
  plot(x, y, xlim=c(0,10), ylim=c(0,1.2), type="l", col=i)
  par(new=T)
}
par(new=F)

# F分布 F(m,n)
x <- seq(0, 3, length=100+1)
nlist <- c(1, 1, 2, 5, 10, 10, 50,  50)
mlist <- c(1, 5, 5, 5, 10, 50, 50, 100)
for (i in 1:length(nlist)) {
  n <- nlist[i]
  m <- mlist[i]
  y <- df(x, df1=n, df2=m)
  plot(x, y, xlim=c(0,3), ylim=c(0,2), type="l", col=i)
  par(new=T)
}
par(new=F)


# pmf
n <- 10
p <- 0.5
x <- 0:10
x
# [1]  0  1  2  3  4  5  6  7  8  9 10
y <- dbinom(x, size=n, prob=p)
y
# [1] 0.0009765625 0.0097656250 0.0439453125 0.1171875000 0.2050781250 0.2460937500
# [7] 0.2050781250 0.1171875000 0.0439453125 0.0097656250 0.0009765625
plot(x, y)
plot(x, y, type="b")
plot(x, y, type="o")
plot(x, y, type="h")

plot(x, y)
par(new=T)
plot(x, y, type="h")


# cdf
n <- 10
p <- 0.5
x <- 0:10
y <- dbinom(x, size=n, prob=p)
z <- pbinom(x, size=n, prob=p)
plot(x, y, xlim=c(0,10), ylim=c(0,1), type="o", col="blue")
par(new=T)
plot(x, z, xlim=c(0,10), ylim=c(0,1), type="o", col="red")

# ppf
pp = seq(0.1, 1, by=0.1)
pp
# [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
q <- qbinom(pp, size=n, prob=p)
q
# [1]  3  4  4  5  5  5  6  6  7 10


# 二項分布 B(n,p)
x <- seq(0, 10, 1)
n <- 10
for (i in 1:9) {
  p <- 0.0 + i/10
  print(p)
  y <- dbinom(x, size=n, prob=p)
  plot(x, y, xlim=c(0,10), ylim=c(0,0.4), type="o", col=i)
  par(new=T)
}
par(new=F)

# ポワソン分布 Po(L)
x <- seq(0, 20, 1)
Llist <- c(1, 2, 5, 10, 15)
for (i in 1:length(Llist)) {
  L <- Llist[i]
  print(L)
  y <- dpois(x, lambda=L)
  plot(x, y, xlim=c(0,20), ylim=c(0,0.4), type="o", col=i)
  par(new=T)
}
par(new=F)

VBA

VBA

参考

Python
- Python チュートリアル
- SciPy.org
  - Numpy and Scipy Documentation
  - SciPy Reference
    - API Reference
      - Statistical functions (scipy.stats)
    - SciPy Tutorial
      - Statistics (scipy.stats)
R
- RjpWiki
- R-Tips
  - 60. 確率分布と乱数
  - 61. データの分布とヒストグラム・密度推定
VBA
- Excel VBA リファレンス
  - WorksheetFunction オブジェクト
    - メソッド
- Visual Basic 言語リファレンス
Excel
- リファレンス
  - 統計関数 (リファレンス)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up