More than 5 years have passed since last update.

時系列解析ライブラリProphet　公式ドキュメント翻訳10（1日単位ではないデータ編）

Last updated at 2018-10-31Posted at 2018-10-28

Facebookが公開している時系列解析用のライブラリProphetの公式ドキュメントを翻訳していきます。

2018/10/28公開。原文リンクは以下。
・Prophet概要＆特徴：https://facebook.github.io/prophet/
・公式ドキュメント：https://facebook.github.io/prophet/docs/quick_start.html

今回の記事は1日単位ではないデータ編です。

1日単位より短い間隔のデータ

Prophetはdsカラムにタイムスタンプデータを格納しデータフレームに渡すことで、1日単位より短い間隔の時系列データについても、予測することができます。タイムスタンプの形式はYYYY-MM-DD HH:MM:SSでなければいけません。ここにあるcsvファイルを参考にしてください。1日単位より短い間隔のデータを使う場合、1日単位の周期性が自動的にフィッティングされます。今回はヨセミテ国立公園の毎日の気温を5分単位で分解したデータをProphetでフィッティングしてみましょう。

# R
df <- read.csv('../examples/example_yosemite_temps.csv')
m <- prophet(df, changepoint.prior.scale=0.01)
future <- make_future_dataframe(m, periods = 300, freq = 60 * 60)
fcst <- predict(m, future)
plot(m, fcst)

# Python
df = pd.read_csv('../examples/example_yosemite_temps.csv')
m = Prophet(changepoint_prior_scale=0.01).fit(df)
future = m.make_future_dataframe(periods=300, freq='H')
fcst = m.predict(future)
fig = m.plot(fcst)

結果のプロットで一日単位の周期性を見ることができます。

# R
prophet_plot_components(m, fcst)

# Python
fig = m.plot_components(fcst)

断片的なデータ

上で用いたデータが午前12時から午前6時までの気温の観測値しか、入っていなかった場合を想像してください。

# R
df2 <- df %>%
  mutate(ds = as.POSIXct(ds, tz="GMT")) %>%
  filter(as.numeric(format(ds, "%H")) < 6)
m <- prophet(df2)
future <- make_future_dataframe(m, periods = 300, freq = 60 * 60)
fcst <- predict(m, future)
plot(m, fcst)

# Python
df2 = df.copy()
df2['ds'] = pd.to_datetime(df2['ds'])
df2 = df2[df2['ds'].dt.hour < 6]
m = Prophet().fit(df2)
future = m.make_future_dataframe(periods=300, freq='H')
fcst = m.predict(future)
fig = m.plot(fcst)

予測結果はあまりよくないようです。先ほどすべての時間帯のデータを使って予測した結果よりも、未来の予測にかなり大きな変動が見られます。ここで問題なのは、Prophetが1日の一部のデータ(午前12時から午前6時)を、丸1日の周期性にあてはめようとしていることです。そのため、1日単位の周期性をあてはめる際に、データにない時間帯は適当にフィッティングしています。これでは予測が成立しているとは到底言えません。解決策は、データが存在する時間帯についてのみ、予測を立てるということです。以下では、そのことを実装するためにfutureデータフレームに午前12時から午前6時のみのデータを入れるという制約を設けています。

# R
future2 <- future %>% 
  filter(as.numeric(format(ds, "%H")) < 6)
fcst <- predict(m, future2)
plot(m, fcst)

# Python
future2 = future.copy()
future2 = future2[future2['ds'].dt.hour < 6]
fcst = m.predict(future2)
fig = m.plot(fcst)

他の断片的なデータに対しても、同じやり方を適用することができます。たとえば、時系列データが平日のみを含んでいる場合、週単位の周期性をあてはめることはできないため、平日のみの周期性をあてはめるように指定します。

月単位のデータ

月単位のデータをProphetのモデルにフィッティングすることもできます。ただProphetでは、モデリングするデータは連続した時系列データであるという前提があります。そのため、月単位のデータをモデルにフィッティングしても、予測結果は日単位のものを返すため、おかしな結果になってしまいます。今回はアメリカの小売り店の売上を10年分予測します。

# R
df <- read.csv('../examples/example_retail_sales.csv')
m <- prophet(df, seasonality.mode = 'multiplicative')
future <- make_future_dataframe(m, periods = 3652)
fcst <- predict(m, future)
plot(m, fcst)

# Python
df = pd.read_csv('../examples/example_retail_sales.csv')
m = Prophet(seasonality_mode='multiplicative').fit(df)
future = m.make_future_dataframe(periods=3652)
fcst = m.predict(future)
fig = m.plot(fcst)

ここでは、断片的なデータのトピックで扱ったデータセットと同じ問題が起きています。年単位の周期性にデータをあてはめる際に、それぞれの月の最初の日のデータしかなく、その他の日については予測不能か、過学習になってしまっています。この問題は周期性の不確かな部分についてMCMCサンプリングを行うことで、解決できます。

# R
m <- prophet(df, seasonality.mode = 'multiplicative', mcmc.samples = 300)
fcst <- predict(m, future)
prophet_plot_components(m, fcst)

# Python
m = Prophet(seasonality_mode='multiplicative', mcmc_samples=300).fit(df)
fcst = m.predict(future)
fig = m.plot_components(fcst)

今回の周期性には毎月1日はデータが存在するため、各月の初めのあたりは誤差の幅が小さくなっています。ただ、1ヶ月の中間あたりの誤差の幅は大きくなっています。Prophetのモデルに月単位のデータをフィッティングする場合は、月単位の予測のみを立ててください。make_future_dataframeのfreqという引数を指定することで予測ができます。

# R
future <- make_future_dataframe(m, periods = 120, freq = 'month')
fcst <- predict(m, future)
plot(m, fcst)

# Python
future = m.make_future_dataframe(periods=120, freq='M')
fcst = m.predict(future)
fig = m.plot(fcst)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

時系列解析ライブラリProphet 公式ドキュメント翻訳10（1日単位ではないデータ編）

1日単位より短い間隔のデータ

断片的なデータ

月単位のデータ

時系列解析ライブラリProphet　公式ドキュメント翻訳10（1日単位ではないデータ編）