はじめに
Python版のCausalImpactを利用するために下記GithubのExampleコードを実行してみたもののエラーが出たので,その解決法をメモ.
実行プログラムとエラーメッセージ
Google colab上で
!pip install causalimpact
を実行後,下記のプログラムを実行(Githubコードをコピペ)
import pandas as pd
from causalimpact import CausalImpact
data = pd.read_csv('https://raw.githubusercontent.com/WillianFuks/tfcausalimpact/master/tests/fixtures/arma_data.csv')[['y', 'X']]
data.iloc[70:, 0] += 5
pre_period = [0, 69]
post_period = [70, 99]
ci = CausalImpact(data, pre_period, post_period)
print(ci.summary())
print(ci.summary(output='report'))
ci.plot()
すると,下記のエラーが発生.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-29-aeae0ba2a7e5> in <cell line: 12>()
10
11 ci = CausalImpact(data, pre_period, post_period)
---> 12 print(ci.summary())
13 print(ci.summary(output='report'))
14 ci.plot()
/usr/local/lib/python3.9/dist-packages/causalimpact/analysis.py in summary(self, output, width, path)
727 confidence = "{}%".format(int((1 - alpha) * 100))
728 post_period = self.params["post_period"]
--> 729 post_inf = self.inferences.loc[post_period[0] : post_period[1], :]
730 post_point_resp = post_inf.loc[:, "response"]
731 post_point_pred = post_inf.loc[:, "point_pred"]
AttributeError: 'NoneType' object has no attribute 'loc'
解決方法
ci.summary()
の前に,ci.run()
を実行する.
import pandas as pd
from causalimpact import CausalImpact
data = pd.read_csv('https://raw.githubusercontent.com/WillianFuks/tfcausalimpact/master/tests/fixtures/arma_data.csv')[['y', 'X']]
data.iloc[70:, 0] += 5
pre_period = [0, 69]
post_period = [70, 99]
ci = CausalImpact(data, pre_period, post_period)
ci.run()
print(ci.summary())
print(ci.summary(output='report'))
ci.plot()
(出力結果)
Average Cumulative
Actual 125 3756
Predicted 120 3613
95% CI [118, 122] [3554, 3672]
Absolute Effect 4 143
95% CI [6, 2] [202, 84]
Relative Effect 4.0% 4.0%
95% CI [5.6%, 2.3%] [5.6%, 2.3%]
P-value 0.0%
Prob. of Causal Effect 100.0%
None
During the post-intervention period, the response variable had an average value of approx. 125.
By contrast, in the absence of an intervention, we would have expected an average response of 120. The 95% interval of
this counterfactual prediction is [118, 122]. Subtracting this prediction from the observed response yields an estimate
of the causal effect the intervention had on the response variable. This effect is 4 with a 95% interval of [6, 2]. For
a discussion of the significance of this effect, see below.
Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully
interpreted), the response variable had an overall value of 3756. By contrast, had the intervention not taken place,
we would have expected a sum of 3613. The 95% interval of this prediction is [3554, 3672]
The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase
of 4.0%. The 95% interval of this percentage is [5.6%, 2.3%]
This means that the positive effect observed during the intervention period is statistically significant and unlikely
to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears
substantive significance can only be answered by comparing the absolute effect 4 to the original goal of the underlying
intervention.
The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area
probability 0.0). This means the causal effect can be considered statistically
significant.
None
少し出力結果に違いはあるが,Githubの出力と似たような結果が得られた.
追記
どうやら,Githubではtfcausalimpact
をインストールしているのに対して,うっかりcausalimpact
をインストールしていたことが問題だったらしい.
Google Colabのランタイムを再起動して,
!pip install tfcausalimpact
を実行した後に,Githubコード(コピペ)を実行すると,無事,出力が得られた.
import pandas as pd
from causalimpact import CausalImpact
data = pd.read_csv('https://raw.githubusercontent.com/WillianFuks/tfcausalimpact/master/tests/fixtures/arma_data.csv')[['y', 'X']]
data.iloc[70:, 0] += 5
pre_period = [0, 69]
post_period = [70, 99]
ci = CausalImpact(data, pre_period, post_period)
print(ci.summary())
print(ci.summary(output='report'))
ci.plot()
(出力結果)
WARNING:tensorflow:From /usr/local/lib/python3.9/dist-packages/causalimpact/model.py:408: calling one_step_predictive (from tensorflow_probability.python.sts.forecast) with timesteps_are_event_shape=True is deprecated and will be removed after 2021-12-31.
Instructions for updating:
`Predictive distributions returned by`tfp.sts.one_step_predictive` will soon compute per-timestep probabilities (treating timesteps as part of the batch shape) instead of a single probability for an entire series (the current approach, in which timesteps are treated as event shape). Please update your code to pass `timesteps_are_event_shape=False` (this will soon be the default) and to explicitly sum over the per-timestep log probabilities if this is required.
Posterior Inference {Causal Impact}
Average Cumulative
Actual 125.23 3756.86
Prediction (s.d.) 120.42 (0.35) 3612.49 (10.42)
95% CI [119.74, 121.1] [3592.06, 3632.91]
Absolute effect (s.d.) 4.81 (0.35) 144.37 (10.42)
95% CI [4.13, 5.49] [123.95, 164.8]
Relative effect (s.d.) 4.0% (0.29%) 4.0% (0.29%)
95% CI [3.43%, 4.56%] [3.43%, 4.56%]
Posterior tail-area probability p: 0.0
Posterior prob. of a causal effect: 100.0%
For more details run the command: print(impact.summary('report'))
Analysis report {CausalImpact}
During the post-intervention period, the response variable had
an average value of approx. 125.23. By contrast, in the absence of an
intervention, we would have expected an average response of 120.42.
The 95% interval of this counterfactual prediction is [119.74, 121.1].
Subtracting this prediction from the observed response yields
an estimate of the causal effect the intervention had on the
response variable. This effect is 4.81 with a 95% interval of
[4.13, 5.49]. For a discussion of the significance of this effect,
see below.
Summing up the individual data points during the post-intervention
period (which can only sometimes be meaningfully interpreted), the
response variable had an overall value of 3756.86.
By contrast, had the intervention not taken place, we would have expected
a sum of 3612.49. The 95% interval of this prediction is [3592.06, 3632.91].
The above results are given in terms of absolute numbers. In relative
terms, the response variable showed an increase of +4.0%. The 95%
interval of this percentage is [3.43%, 4.56%].
This means that the positive effect observed during the intervention
period is statistically significant and unlikely to be due to random
fluctuations. It should be noted, however, that the question of whether
this increase also bears substantive significance can only be answered
by comparing the absolute effect (4.81) to the original goal
of the underlying intervention.
The probability of obtaining this effect by chance is very small
(Bayesian one-sided tail-area probability p = 0.0).
This means the causal effect can be considered statistically
significant.
こちらの出力の方が,先ほどの出力よりも,Githubの出力と一致してるっぽい.
おわりに
最後まで読んでいただきありがとうございました.
zennでも記事を書いています.
主に「Python×データ分析」に関する記事を執筆しておりますので,ご一読いただけますと幸いです.
また,過去にLTや勉強会で発表した資料を下記リンクにまとめておりますので,こちらもぜひご覧くださいませ.