More than 1 year has passed since last update.

scipy.stats: 幾何平均と調和平均 gmean, hmean

Last updated at 2022-06-22Posted at 2022-05-26

1.　幾何平均 `gmean`

gmean(a, axis=0, dtype=None, weights=None)

from scipy.stats import gmean
x = [1.2, 3.1, 4.6, 2.2, 5.7]
gmean(x)

2.9263054868319522

gmean 関数を使わずに計算すると以下のようになる。

import numpy as np
np.exp(np.mean(np.log(x)))

2.9263054868319522

np.prod(x)**(1/len(x)) # 非推奨

2.9263054868319527

以下のようなテストデータ（平均値=50，標準偏差=10)で幾何平均を求めてみる。

from scipy.stats import norm
z = norm.rvs(loc=50, scale=10, size=200)
import matplotlib.pyplot as plt
plt.hist(z, bins=30);

サンプルサイズ 200 で，教科書的な定義式通りの np.prod(z)**(1/len(z)) は途中でオーバーフローしてしまい inf を返す。

gmean(z), np.exp(np.mean(np.log(z))), np.prod(z)**(1/len(z)), np.prod(z)

(49.19589704885539, 49.19589704885539, inf, inf)

hmean(a, axis=0, dtype=None)

from scipy.stats import hmean
x = [1.2, 3.1, 4.6, 2.2, 5.7]
hmean(x)

2.4958950838709315

hmean 関数を使わずに計算すると以下のようになる。

1/np.mean(1/np.array(x))

2.4958950838709315

`np.array` を使ったのは，`x` がリストのとき `1/x` はエラーになるので，numpy.ndarray に変換するためである。

どうせ `numpy` を使うなら，逆数を返す `np.reciprocal` を使えば，統一性がある。

np.reciprocal(np.mean(np.reciprocal(x)))

2.4958950838709315