StatsModels
線形回帰、ロジスティック回帰、一般化線形モデル、ARIMAモデル、自己相関関数の算出などの統計モデルがいろいろ使えるパッケージです。
API一覧
https://www.statsmodels.org/stable/api.html
1. install
pipで入ります
https://www.statsmodels.org/stable/install.html
pip install statsmodels
2. 線形回帰
statsmodels.api.OLS()でordinary least squareによる線形回帰モデルを作成できます。xを変数のみで構成すればy切片なし、statsmodels.api.add_constant()でxに定数列を追加すればy切片ありでの回帰となります。
import numpy as np
import statsmodels.api as sm
spector_data = sm.datasets.spector.load(as_pandas=False)
x = spector_data.exog
xc = sm.add_constant(x, prepend=False)
y = spector_data.endog
print(xc.shape, y.shape)
# Fit and summarize OLS model
model = sm.OLS(y, xc)
res = model.fit()
print(res.summary())
>>> res.params # 係数
array([ 0.46385168, 0.01049512, 0.37855479, -1.49801712])
>>> res.pvalues # P値
array([0.00784052, 0.59436148, 0.01108768, 0.00792932])
>>> res.aic, res.bic # 赤池情報量基準、ベイズ情報量基準
(33.95649234217083, 39.81943595336974)
>>> res.bse # 標準誤差
array([0.16195635, 0.01948285, 0.13917274, 0.52388862])
>>> res.resid # 残差
array([ 0.05426921, -0.07340692, -0.27529932, 0.01762875, 0.42221284,
-0.00701576, 0.03936941, -0.05363477, -0.16983152, 0.37535999,
0.06818476, -0.28335827, -0.39932119, 0.72348259, -0.41225249,
0.0276562 , -0.03995305, -0.01409045, -0.56914272, 0.39131297,
-0.06696482, 0.14645583, -0.36800073, -0.78153024, 0.22554445,
0.52339378, 0.36858806, -0.37090458, 0.20600614, 0.0226678 ,
-0.53887544, 0.8114495 ])
推定はpredict()です
result.predict(xc)
array([-0.05426921, 0.07340692, 0.27529932, -0.01762875, 0.57778716,
0.00701576, -0.03936941, 0.05363477, 0.16983152, 0.62464001,
-0.06818476, 0.28335827, 0.39932119, 0.27651741, 0.41225249,
-0.0276562 , 0.03995305, 0.01409045, 0.56914272, 0.60868703,
0.06696482, 0.85354417, 0.36800073, 0.78153024, 0.77445555,
0.47660622, 0.63141194, 0.37090458, 0.79399386, 0.9773322 ,
0.53887544, 0.1885505 ])
3. ロジスティック回帰
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
# Load the data from Spector and Mazzeo (1980)
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)
y = spector_data.endog
x = spector_data.exog
# Follow statsmodles ipython notebook
model = sm.Logit(y, x)
res = model.fit(disp=0)
print(res.summary())
>>> res.params
array([-13.02134686, 2.82611259, 0.09515766, 2.37868766])
>>> res.pvalues
array([0.00827746, 0.02523911, 0.50143424, 0.0254552 ])
>>> res.aic, res.bic
(33.779268444262826, 39.642212055461734)
>>> res.bse
array([4.93132421, 1.26294108, 0.14155421, 1.06456425])
>>> res.resid_dev
array([-0.23211021, -0.35027122, -0.64396264, -0.22909819, 1.06047795,
-0.26638437, -0.23178275, -0.32537884, -0.48538752, 0.85555565,
-0.22259715, -0.64918082, -0.88199929, 1.81326864, -0.94639849,
-0.24758297, -0.3320177 , -0.28054444, -1.33513084, 0.91030269,
-0.35592175, 0.44718924, -0.74400503, -1.95507406, 0.59395382,
1.20963752, 0.95233204, -0.85678568, 0.58707192, 0.33529199,
-1.22731092, 2.09663887])
>>> res.predict(x)
array([0.02657799, 0.05950125, 0.18725993, 0.02590164, 0.56989295,
0.03485827, 0.02650406, 0.051559 , 0.11112666, 0.69351131,
0.02447037, 0.18999744, 0.32223955, 0.19321116, 0.36098992,
0.03018375, 0.05362641, 0.03858834, 0.58987249, 0.66078584,
0.06137585, 0.90484727, 0.24177245, 0.85209089, 0.83829051,
0.48113304, 0.63542059, 0.30721866, 0.84170413, 0.94534025,
0.5291172 , 0.11103084])
4. 一般化線形モデル
分布とリンク関数は以下の組み合わせから選びます
また、分布とリンク関数についての詳細は以下にまとまってます
https://www.statsmodels.org/stable/glm.html#families
sm.GLM()のfamily=sm.families.Gamma()の部分が分布とリンク関数を指定する部分です。下記ではガンマ分布でリンク関数が指定されていないのでデフォルトのinverseが使われますが、logを使う場合はsm.families.Gaussian(sm.families.links.log)のようにします。
import statsmodels.api as sm
data = sm.datasets.scotland.load(as_pandas=False)
x = sm.add_constant(data.exog)
y = data.endog
model = sm.GLM(y, x, family=sm.families.Gamma())
res = model.fit()
res.summary()
>>> res.params
[-1.77652703e-02 4.96176830e-05 2.03442259e-03 -7.18142874e-05
1.11852013e-04 -1.46751504e-07 -5.18683112e-04 -2.42717498e-06]
>>> res.scale
0.003584283173493321
>>> res.deviance
0.08738851641699877
>>> res.pearson_chi2
0.08602279616383915
>>> res.llf
-83.01720216107174
>>> res.predict(x)
array([57.80431482, 53.2733447 , 50.56347993, 58.33003783, 70.46562169,
56.88801284, 66.81878401, 66.03410393, 57.92937473, 63.23216907,
53.9914785 , 61.28993391, 64.81036393, 63.47546816, 60.69696114,
74.83508176, 56.56991106, 72.01804172, 64.35676519, 52.02445881,
64.24933079, 71.15070332, 45.73479688, 54.93318588, 66.98031261,
52.02479973, 56.18413736, 58.12267471, 67.37947398, 60.49162862,
73.82609217, 69.61515621])