LoginSignup
0
0

対数変換による確率密度関数の変換

Last updated at Posted at 2024-05-14

動機

個人所得や都市の人口など、確率変数が大きな値と小さな値を取るときに対数変換を経由して確率密度関数を求めることができる。

対数変換による確率密度関数の変換公式

変数変換$y = \log x$にともなう確率変数$X$から$Y$への変換により、$X$の確率密度関数(PDF) $p(x)$は$Y$の確率密度関数$q(y)$に次のように変換される。

p(x) = q(y) /x
導出

これは次のように導出される。累積分布関数を

P(X \leq x) = \int_{-\infty}^{x} p(t)dt
P(Y \leq y) = \int_{-\infty}^{y} q(s)ds

で定義する。変数変換$y = \log x, Y = \log X$により

P(X \leq x) = P(Y \leq \log x)

なので

q(y) = \frac{dP(Y \leq y)}{dy} = \frac{dP(X \leq x)}{dx} \frac{dx}{dy} = p(x) x

となる。

Python code

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def pdfs(series, bins, ax=None, scale='linear', kwargs={}):
  cnts = pd.cut(series, bins).value_counts()
  p = cnts / series.shape[0]
  vals = p / np.diff(bins)
  if scale=='linear':
    mb = np.array([ (bins[i]+bins[i+1])/2. for i in np.arange(bins.shape[0]-1) ])
  elif scale=='log':
    mb = np.array([ np.sqrt(bins[i]*bins[i+1]) for i in np.arange(bins.shape[0]-1) ])
  else:
    raise ValueError("scale should be either 'linear' or 'log'")
  if ax!=None:
    ax.plot(mb, vals, **kwargs)
  return mb, vals, cnts, p

def analytical_pdf_pareto(a, xmin, x):
  return a/xmin * x**(-(a+1))

if __name__ == '__main__':

  a, xmin = 2.0, 1.0
  sample_size = 100000
  
  series = (np.random.pareto(a, sample_size) + 1) * xmin
  ln_transformed_series = np.log(series)

  ### calculate bins 
  ln_bin_width = 0.1

  ln_transformed_bins = np.arange(ln_transformed_series.min(), ln_transformed_series.max(), bin_width)
  bins = np.exp(ln_transformed_bins)

  ### calculate PDF of original series
  original_bin, original_pdf, _, __ = pdfs(series = series, bins = bins)
  
  ### calculate PDF of log-transformed series
  ln_bin, ln_pdf, _, __ = pdfs(series = ln_transformed_series, bins = ln_transformed_bins)

  ### plot PDF
  fig, ax = plt.subplots()
  ax.set_xlabel(r'$x$')
  ax.set_ylabel(r'$p(x)$')
  ax.set_xscale('log')
  ax.set_yscale('log')
  
  ax.plot(original_bin, original_pdf, 
  c='b', marker='o', label='PDF of original series')
  
  ax.plot(original_bin, ln_pdf / original_pdf, 
  c='r', marker='x', label='PDF via log-transformation')
  
  ax.plot(original_bin, analytical_pdf_pareto(a, xmin, original_bin),
  c='k', ls='--')

  ax.legend(loc='lower left')
  fig.savefig('probability_distribution.png')
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0