More than 5 years have passed since last update.

多変量正規分布をPythonでplotして理解する

Posted at 2020-01-04

はじめに

統計を勉強していた際に出てきた「多変量正規分布」のイメージを掴むためpythonでplotしてみました。今回は可視化して際にわかりやすいよう$n$数を2にして二次元正規分布をplotしています。

参考

多変量正規分布の理解とそのplotを行うに当たって下記を参考にさせていただきました。

多変量正規分布の概要

$n$変数の多変量正規分布は下記のように表されます。


f(\vec{x}) = \frac{1}{\sqrt{(2\pi)^n |\sum|}}exp \left \{-\frac{1}{2}{}^t (\vec{x}-\vec{\mu}) {\sum}^{-1} (\vec{x}-\vec{\mu}) \right \}

変数が$n$個あるためデータを$n$次元ベクトル表記で表します。さらに平均値$\mu$も変数の数だけ存在するため同様にベクトル表記で表します。


{ \begin{equation}\vec{x}=\begin{pmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \\  \end{pmatrix}, \vec{\mu}=\begin{pmatrix}\mu_1 \\ \mu_2 \\ \vdots \\ \mu_n \\  \end{pmatrix}   \end{equation}
}

ある一つの要素$x_{i}$が確率変数$X_{i}$のデータを表し、平均値$\mu_i$が確率変数$X_{i}$の平均値を表します。
続いて分散についてですが、多変量となる場合は各データの分布のみならずデータ間の相関を考慮する必要があるため**分散共分散行列$\sum$**を用います。


{ \begin{equation}\ \ \ \Sigma =  \begin{pmatrix} \sigma_{1}^2 & \cdots & \sigma_{1i} & \cdots & \sigma_{1n}\\ \vdots & \ddots & & & \vdots \\ \sigma_{i1} & & \sigma_{i}^2 & & \sigma_{in} \\ \vdots & & & \ddots & \vdots \\ \sigma_{n1} & \cdots & \sigma_{ni} & \cdots & \sigma_{n}^2 \end{pmatrix} \end{equation}
}

$\sigma^2_i$は$i$番目の変数の分散で、$\sigma_{ij} =\sigma_{ji} (i≠j)$は$i$番目と$j$番目の変数間の共分散になっています。
そして$n$が$2$の時の二次元正規分布は下記のように表されます。

N_2 \left ( \begin{pmatrix}  \mu_x \\  \mu_y \\  \end{pmatrix} , \begin{pmatrix}  \sigma_{x}^2 & \sigma_{xy}\\  \sigma_{xy} & \sigma_{y}^2\\  \end{pmatrix} \right  )

それでは二次元正規分布をplotしてみたいと思います。

二次元正規分布のplot

二次元正規分布をplotするスクリプトが下記になります。
まずは2変数とも標準正規分布に従い、互いに独立な場合で出力してみます。

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
from matplotlib import cm

# 関数に投入するデータを作成
x = y = np.arange(-20, 20, 0.5)
X, Y = np.meshgrid(x, y)

z = np.c_[X.ravel(),Y.ravel()]

# 二次元正規分布の確率密度を返す関数
def gaussian(x):
    #分散共分散行列の行列式
    det = np.linalg.det(sigma)
    print(det)
    #分散共分散行列の逆行列
    inv = np.linalg.inv(sigma)
    n = x.ndim
    print(inv)
    return np.exp(-np.diag((x - mu)@inv@(x - mu).T)/2.0) / (np.sqrt((2 * np.pi) ** n * det))

# 2変数の平均値を指定
mu = np.array([0,0])
# 2変数の分散共分散行列を指定
sigma = np.array([[1,0],[0,1]])

Z = gaussian(z)
shape = X.shape
Z = Z.reshape(shape)

# 二次元正規分布をplot
fig = plt.figure(figsize = (15, 15))
ax = fig.add_subplot(111, projection='3d')
    
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm)
plt.show()

出力結果は下記です。2変数とも正規分布のため偏りのない尖ったグラフになります。

それでは異なる形のグラフもplotします。
2変数の分布が下記となる場合の二次元正規分布をplotしてみましょう。

# 2変数の平均値を指定
mu = np.array([3,1])
# 2変数の分散共分散行列を指定
sigma = np.array([[10,5],[5,10]])

下記は先ほどのplotと同様です。


Z = gaussian(z)
shape = X.shape
Z = Z.reshape(shape)

# 二次元正規分布をplot
fig = plt.figure(figsize = (15, 15))
ax = fig.add_subplot(111, projection='3d')
    
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm)
plt.show()

出力結果は下記になります。今回は互いに相関している分布をplotしたため、少し斜めに歪んだ形になっていることがわかります。

数式上ではわかりにくかったことも、こうやって可視化するとイメージが掴みやすいですね。

統計を勉強していると中々数式だけでイメージが掴めないことも多いので、pythonで自分で書いてみたりplotして可視化するのを積極的にやっていきたいと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

多変量正規分布をPythonでplotして理解する

はじめに

参考

多変量正規分布の概要

二次元正規分布のplot

Next