0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

ベータ分布でベイズ推定

Last updated at Posted at 2022-03-21

映画の評点などのレビュー数が少ない場合のベイズ推定です。

ここでは0-5点になるものとします。
ライブラリはscipyとpymc(v4)を使います。

レビューの点数の期待値は4.00、分散は0.5とします。
[0,1]区間に正規化した (レビュー点数 - 0)/(5 - 0)がベータ分布に従うとします。
パラメータの分布(事前分布)はalpha,beta共に正規分布とします。
まず、レビューの期待値からベータ分布のパラメータを推定して、事前分布に期待値4.00と分散1.5の情報を折り込みます。

import numpy as np
from scipy import stats, optimize

def beta_E(a, b, alpha, beta):
  return a + (b -a) * alpha / (alpha + beta) 

def beta_V(a, b, alpha, beta):
  return (b - a) * (b - a) * alpha * beta / ((alpha + beta) * (alpha + beta) * (alpha + beta + 1))

a = -0.1
b = 5.1
E = 4.00
V = 0.5

sol = optimize.root(fun, x0=[E, V])
x = sol.x
alpha_0 = x[0]
beta_0 = x[1]
print('alpha_0: {}'.format(alpha_0))
print('beta_0: {}'.format(beta_0))
print('E_0: {}'.format(beta_E(a, b, alpha_0, beta_0)))
print('V_0: {}'.format(beta_V(a, b, alpha_0, beta_0)))

結果

alpha_0: 1.582179487161132
beta_0: 0.42448717948189413
E_0: 4.000000000000737
V_0: 0.500000000011058

期待値、分散が一致しています。

ベイズ推定してみましょう

観測値が1個で平均の4.0の場合

observed_X = [4.0]
observed_Y = [(x - a)/(b - a) for x in observed_X]

with pm.Model() as model:
  alpha = pm.Normal('alpha', mu = alpha_0, sigma = 5 * abs(alpha_0))
  beta = pm.Normal('beta', mu = beta_0, sigma = 5 * abs(beta_0))
  Y = pm.Beta('Y', alpha=alpha, beta=beta, observed=observed_Y)
with model:
  start = {'alpha':  alpha_0, 'beta': beta_0}
  step = pm.Metropolis()
  trace = pm.sample(draws=10000, tune=5000, initvals=start, step=step, chains=4)
  alpha = pm.summary(trace).to_dict()['mean']['alpha']
  beta = pm.summary(trace).to_dict()['mean']['beta']
  print('alpha: {}'.format(alpha))
  print('beta: {}'.format(beta))
  print('E: {}'.format(beta_E(a, b, alpha, beta)))
  print('V: {}'.format(beta_V(a, b, alpha, beta)))

結果

alpha: 7.827
beta: 2.156
E: 3.9769708504457566
V: 0.41687672830133243

平均は3.97で4.00に近い

observed_X = [3.0, 3.0, 3.0] の場合、

alpha: 20.771
beta: 13.522
E: 3.0495990435365816
V: 0.18298078631297066

平均は3.04まで下がっている

observed_X = [4.0, 4.0 ,4.0]の場合、

alpha: 39.544
beta: 10.711
E: 3.9917082877325636
V: 0.08847539526504697

平均は3.99で4.00とほぼ一致で問題なし

observed_X = [2.0] の場合、

alpha: 9.191
beta: 11.187
E: 2.2453332024732555
V: 0.3131791946036

平均は2.24で2.0に近づいている

グラフ付きの実際に動くコード

import matplotlib.pyplot as plt
import math
import numpy as np
from scipy import stats, optimize
import pymc as pm

def beta_E(a, b, alpha, beta):
  return a + (b -a) * alpha / (alpha + beta) 

def beta_V(a, b, alpha, beta):
  return (b - a) * (b - a) * alpha * beta / ((alpha + beta) * (alpha + beta) * (alpha + beta + 1))

a = -0.1
b = 5.1
E = 4.00
V = 0.5

def fun(x):
  return [beta_E(a, b, x[0], x[1]) - E, beta_V(a, b, x[0], x[1]) - V]

def main():

  fig = plt.figure()
  axes= fig.subplots(2)

  sol = optimize.root(fun, x0=[E, V])
  x = sol.x
  alpha_0 = x[0]
  beta_0 = x[1]
  print('alpha_0: {}'.format(alpha_0))
  print('beta_0: {}'.format(beta_0))
  print('E_0: {}'.format(beta_E(a, b, alpha_0, beta_0)))
  print('V_0: {}'.format(beta_V(a, b, alpha_0, beta_0)))

  y = np.linspace(-0.1, 5.1, 200)
  y0_pdf = stats.beta.pdf((y - a) / (b - a), alpha_0, beta_0)
  axes[0].set_title("Distribution")
  axes[0].plot(y, y0_pdf)

  observed_X = [2.0] #試したい観測値に変える
  observed_Y = [(x - a)/(b - a) for x in observed_X]

  with pm.Model() as model:
    alpha = pm.Normal('alpha', mu = alpha_0, sigma = 5 * abs(alpha_0))
    beta = pm.Normal('beta', mu = beta_0, sigma = 5 * abs(beta_0))
    Y = pm.Beta('Y', alpha=alpha, beta=beta, observed=observed_Y)
  with model:
    start = {'alpha':  alpha_0, 'beta': beta_0}
    step = pm.Metropolis()
    trace = pm.sample(draws=10000, tune=5000, initvals=start, step=step, chains=4)
    alpha = pm.summary(trace).to_dict()['mean']['alpha']
    beta = pm.summary(trace).to_dict()['mean']['beta']
    print('alpha: {}'.format(alpha))
    print('beta: {}'.format(beta))
    print('E: {}'.format(beta_E(a, b, alpha, beta)))
    print('V: {}'.format(beta_V(a, b, alpha, beta)))
    y_pdf = stats.beta.pdf((y - a) / (b - a), alpha, beta)
    axes[1].set_title("After Distribution")
    axes[1].plot(y, y_pdf)
    fig.savefig("bayes.png")

if __name__ == '__main__':
  main()

グラフ

observed_X = [2.0]

bayes2.png

そこそこいい感じ?

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?