Python と R の確率分布のmethod名と引数の扱いをいつも忘れる1か月後の自分へ

Posted at 2020-03-22

この記事の目的

Python の method 名は直接的でわかりやすい
R は分布パラメータの引数名が直感的だが、Python の a=, b= と loc=, scale= の使い分けの指定は数式から変換する時に脳に負担がかかる。
どちらも慣れれば問題なし

正規分布での例

事前準備 for Python: from scipy import stats

python	r	return value	meanings
stats.norm.pdf(0)	dnorm(x = 0)	0.3989423	y-axis height of distribution at x = 0; probability density function
stats.norm.ppf(0.99)	qnorm(p = 0.99)	2.326348	x position that satisfy integral distribution from -Inf to x = 0.99; percent point function
stats.norm.cdf(2.326348)	pnorm(q = 2.326348)	0.99	inverse of ppf; cumlative distribution function
stats.norm.rvs(size=5)	rnorm(n = 5)	10 values	make random variables

pythonは全distributionで以下が共通

# python
>>> stats.norm.rvs(size=5, loc=100, scale=10)
array([ 87.94461629, 101.62097735, 114.57610849, 109.68581233,
        92.61449913])

# R
rnorm(5, mean=100, sd=10)
# [1]  88.97331 116.73641  87.71554  97.24457  95.40161

以上