1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

scipy.stats: 回帰分析 linregress

Posted at

scipy.stats: 回帰直線 linregress

単回帰分析を行う。

linregress(x, y=None, alternative='two-sided')

from scipy.stats import linregress
import numpy as np

x = np.arange(15)
y = x**2

res = linregress(x, y)
res
LinregressResult(slope=13.999999999999998, intercept=-30.333333333333314, rvalue=0.9644093612193901, pvalue=6.916724428470525e-09, stderr=1.0645812948447575, intercept_stderr=8.757219244080245)

名前付きタプルで,以下のものが返される。

  • slope=13.999999999999998 は回帰直線の傾き
  • intercept=-30.333333333333314 は回帰直線の切片
  • rvalue=0.9644093612193901 は二変数間のピアソンの積率相関係数
  • pvalue=6.916724428470525e-09 は,相関係数=0の検定,および回帰直線の傾き=0の検定,および重相関係数=0の検定の $p$ 値(3 つの検定はすべて同じ)
  • stderr=1.0645812948447575 は,回帰直線の傾きの標準誤差
  • intercept_stderr=8.757219244080245 は,回帰直線の切片の標準誤差

R で分析すると以下のようになる。

> x = 0:14
> y = x^2
> res = lm(y ~ x)
> summary(res)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.667 -14.667  -2.667  11.833  30.333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -30.333      8.757  -3.464   0.0042
x             14.000      1.065  13.151 6.92e-09

Residual standard error: 17.81 on 13 degrees of freedom
Multiple R-squared:  0.9301,	Adjusted R-squared:  0.9247 
F-statistic: 172.9 on 1 and 13 DF,  p-value: 6.917e-09

R の出力にあって linregress にないものを計算する

  • Multiple R-squared 決定係数
res.rvalue**2
0.9300854160075921
  • Adjusted R-squared 自由度の調整済み決定係数
n = len(x)
((n-1)*res.rvalue**2-1)/(n-2)
0.9247073710850992
  • t value of Intercept 切片の検定の $t$ 値
t_intercept = res.intercept / res.intercept_stderr
t_intercept
-3.4638088287943933
  • Pr(>|t|) of t value of Intercept 切片の検定の $p$ 値
from scipy.stats import t
t.sf(np.abs(t_intercept), n-2) * 2
0.00419543767710724
  • t value of x 傾きの検定の $t$ 値
t_slope = res.slope / res.stderr
t_slope
13.150710112788095
  • Pr(>|t|) of t value of x(ie. slope) 傾きの検定の $p$ 値
t.sf(abs(t_slope), n-2) * 2
6.916724428470525e-09
  • Residual standard error 残差標準誤差と自由度
predicted = x * res.slope + res.intercept
Se = sum((y - predicted)**2)
St = np.var(y) * n
Sr = St - Se
df = n-2; np.sqrt(Se/df), df

(17.813852287849848, 13)
  • F-statistic, df1, df2, p-value 回帰の分散分析の $F$ 値,自由度,$p$ 値
from scipy.stats import f
F = Sr*(n-2)/Se
F, 1, n-2, f.sf(F, 1, n-2)
(172.94117647058823, 1, 13, 6.916724428470249e-09)

これだけ余計なことするなら,最初から関数書いたほうが速い

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?