以下の関数 gendat2 は,相関係数が r となる n × 2 行列を求める。
r = np.corrcoef(x.T)[0,1]
import numpy as np
from scipy.linalg import eigvals, inv, svd, cholesky
from scipy import random
from scipy.stats import zscore
def gendat2(n, r):
r = np.array([[1, r], [r, 1]])
if any(eigvals(r) <= 0):
raise Exception("'r' is not positive definite.")
x = random.randn(n, size[0])
x = zscore(x, ddof=1)
r2 = np.corrcoef(x.T)
solver2 = inv(r2)
vec, val, junk = svd(r2, full_matrices=False)
coeff = solver2 @ (np.sqrt(val) * vec)
z = x @ coeff @ cholesky(r)
return z
相関係数が 0.5 になる 10 × 2 データ行列
x = gendat2(10, 0.5)
np.corrcoef(x.T)[0,1]
0.4999999999999999
相関係数が 0.63 になる 1000 × 2 データ行列
x = gendat2(1000, 0.63)
X = x[:, 0]
Y = x[:, 1]
coeff = np.corrcoef(X, Y)[0,1]
図に描いてみる。
import matplotlib.pyplot as plt
plt.figure(figsize=(5,5))
plt.title("correlation coefficient = {0:.3f}".format(coeff))
plt.scatter(X, Y)
# plt.xlim([0, 1])
# plt.ylim([0, 1])
plt.grid()
plt.show()