LoginSignup
3
2

More than 3 years have passed since last update.

pythonで線形回帰やGLMが使えるStatsModelsの使い方メモ

Posted at

StatsModels

線形回帰、ロジスティック回帰、一般化線形モデル、ARIMAモデル、自己相関関数の算出などの統計モデルがいろいろ使えるパッケージです。

API一覧
https://www.statsmodels.org/stable/api.html

1. install

pipで入ります
https://www.statsmodels.org/stable/install.html

terminal
pip install statsmodels

2. 線形回帰

statsmodels.api.OLS()でordinary least squareによる線形回帰モデルを作成できます。xを変数のみで構成すればy切片なし、statsmodels.api.add_constant()でxに定数列を追加すればy切片ありでの回帰となります。

python
import numpy as np

import statsmodels.api as sm

spector_data = sm.datasets.spector.load(as_pandas=False)
x = spector_data.exog
xc = sm.add_constant(x, prepend=False)
y = spector_data.endog
print(xc.shape, y.shape)

# Fit and summarize OLS model
model = sm.OLS(y, xc)
res = model.fit()

print(res.summary())

image.png
それぞれの値を取り出すことも出来ます

python
>>> res.params  # 係数
array([ 0.46385168,  0.01049512,  0.37855479, -1.49801712])

>>> res.pvalues  # P値
array([0.00784052, 0.59436148, 0.01108768, 0.00792932])

>>> res.aic, res.bic  # 赤池情報量基準、ベイズ情報量基準
(33.95649234217083, 39.81943595336974)

>>> res.bse  # 標準誤差
array([0.16195635, 0.01948285, 0.13917274, 0.52388862])

>>> res.resid  # 残差
array([ 0.05426921, -0.07340692, -0.27529932,  0.01762875,  0.42221284,
       -0.00701576,  0.03936941, -0.05363477, -0.16983152,  0.37535999,
        0.06818476, -0.28335827, -0.39932119,  0.72348259, -0.41225249,
        0.0276562 , -0.03995305, -0.01409045, -0.56914272,  0.39131297,
       -0.06696482,  0.14645583, -0.36800073, -0.78153024,  0.22554445,
        0.52339378,  0.36858806, -0.37090458,  0.20600614,  0.0226678 ,
       -0.53887544,  0.8114495 ])

推定はpredict()です

python
result.predict(xc)
result
array([-0.05426921,  0.07340692,  0.27529932, -0.01762875,  0.57778716,
        0.00701576, -0.03936941,  0.05363477,  0.16983152,  0.62464001,
       -0.06818476,  0.28335827,  0.39932119,  0.27651741,  0.41225249,
       -0.0276562 ,  0.03995305,  0.01409045,  0.56914272,  0.60868703,
        0.06696482,  0.85354417,  0.36800073,  0.78153024,  0.77445555,
        0.47660622,  0.63141194,  0.37090458,  0.79399386,  0.9773322 ,
        0.53887544,  0.1885505 ])

3. ロジスティック回帰

python
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Load the data from Spector and Mazzeo (1980)
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)

y = spector_data.endog
x = spector_data.exog

# Follow statsmodles ipython notebook
model = sm.Logit(y, x)
res = model.fit(disp=0)

print(res.summary())

image.png
同様に諸々の値も取得できます

python
>>> res.params
array([-13.02134686,   2.82611259,   0.09515766,   2.37868766])

>>> res.pvalues
array([0.00827746, 0.02523911, 0.50143424, 0.0254552 ])

>>> res.aic, res.bic
(33.779268444262826, 39.642212055461734)

>>> res.bse
array([4.93132421, 1.26294108, 0.14155421, 1.06456425])

>>> res.resid_dev
array([-0.23211021, -0.35027122, -0.64396264, -0.22909819,  1.06047795,
       -0.26638437, -0.23178275, -0.32537884, -0.48538752,  0.85555565,
       -0.22259715, -0.64918082, -0.88199929,  1.81326864, -0.94639849,
       -0.24758297, -0.3320177 , -0.28054444, -1.33513084,  0.91030269,
       -0.35592175,  0.44718924, -0.74400503, -1.95507406,  0.59395382,
        1.20963752,  0.95233204, -0.85678568,  0.58707192,  0.33529199,
       -1.22731092,  2.09663887])
python
>>> res.predict(x)
array([0.02657799, 0.05950125, 0.18725993, 0.02590164, 0.56989295,
       0.03485827, 0.02650406, 0.051559  , 0.11112666, 0.69351131,
       0.02447037, 0.18999744, 0.32223955, 0.19321116, 0.36098992,
       0.03018375, 0.05362641, 0.03858834, 0.58987249, 0.66078584,
       0.06137585, 0.90484727, 0.24177245, 0.85209089, 0.83829051,
       0.48113304, 0.63542059, 0.30721866, 0.84170413, 0.94534025,
       0.5291172 , 0.11103084])

4. 一般化線形モデル

分布とリンク関数は以下の組み合わせから選びます
image.png
また、分布とリンク関数についての詳細は以下にまとまってます
https://www.statsmodels.org/stable/glm.html#families

sm.GLM()のfamily=sm.families.Gamma()の部分が分布とリンク関数を指定する部分です。下記ではガンマ分布でリンク関数が指定されていないのでデフォルトのinverseが使われますが、logを使う場合はsm.families.Gaussian(sm.families.links.log)のようにします。

python
import statsmodels.api as sm
data = sm.datasets.scotland.load(as_pandas=False)
x = sm.add_constant(data.exog)
y = data.endog

model = sm.GLM(y, x, family=sm.families.Gamma())
res = model.fit()
res.summary()

image.png

python
>>> res.params
[-1.77652703e-02  4.96176830e-05  2.03442259e-03 -7.18142874e-05
  1.11852013e-04 -1.46751504e-07 -5.18683112e-04 -2.42717498e-06]

>>> res.scale
0.003584283173493321

>>> res.deviance
0.08738851641699877

>>> res.pearson_chi2
0.08602279616383915

>>> res.llf
-83.01720216107174
python
>>> res.predict(x)
array([57.80431482, 53.2733447 , 50.56347993, 58.33003783, 70.46562169,
       56.88801284, 66.81878401, 66.03410393, 57.92937473, 63.23216907,
       53.9914785 , 61.28993391, 64.81036393, 63.47546816, 60.69696114,
       74.83508176, 56.56991106, 72.01804172, 64.35676519, 52.02445881,
       64.24933079, 71.15070332, 45.73479688, 54.93318588, 66.98031261,
       52.02479973, 56.18413736, 58.12267471, 67.37947398, 60.49162862,
       73.82609217, 69.61515621])
3
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
2