H2O.aiのAutoMLを使って電気使用量を学習し、気象予報値取得APIを使って数日後の電気使用量を予測してみる

Last updated at 2021-01-31Posted at 2021-01-30

はじめに

もう何度も同じネタではありますが、今回はH2O.aiのAutoMLを使って電気使用量を予測してみます。

H2O documentation

動作環境

mac0S 10.14
Python3

インストール

Java

macOS用のJavaをOracleの公式サイトからダウンロードして、インストールします。

H2O

以下のURLを参照しながらインストールします。

必要なライブラリをインストール。

pip install requests
pip install tabulate
pip install "colorama>=0.3.8"
pip install future

ちなみに、僕のMacに足りないのはtabulateだけでした。
必要なライブラリをインストール後、H2Oをインストールします。

pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o

データ読込

まずは、中国電力から電気使用量実績と気象庁から広島と松江の時別気温のデータをダウンロードして、CSVで保存しておきます。

保存したCSVを以下のコードで読み込みます。

from glob import glob
import pandas as pd

##### 電気使用量実績の読込

files = glob("juyo-*.csv")
files.sort()

df_kw = pd.DataFrame()

for f in files:
    df_kw = pd.concat([df_kw, pd.read_csv(f,encoding="Shift_JIS",skiprows=2)])

df_kw = df_kw.reset_index(drop=True)
df_kw["MW"] = df_kw["実績(万kW)"] * 10

df_kw.index = df_kw.index.map(lambda _: pd.to_datetime(df_kw.DATE[_] + " " + df_kw.TIME[_]))

##### 気温データの読込

# 気温データ読込関数
def read_temp(city):
    
    df_tmp = pd.DataFrame()
    
    files = glob("data-{}-*.csv".format(city))
    files.sort()
    
    for f in files:
        df_tmp = pd.concat([df_tmp, pd.read_csv(f,encoding="Shift_JIS",skiprows=4)])
    
    df_tmp = df_tmp.reset_index(drop=True)
    df_tmp.columns = ["DATETIME","TEMP","品質情報","均質番号"]
    
    df_tmp.DATETIME = df_tmp.DATETIME.map(lambda _: pd.to_datetime(_))
    df_tmp.index = df_tmp.DATETIME
    
    return df_tmp

# 広島
df_tmp_hiroshima = read_temp("hiroshima")

# 松江
df_tmp_matsue = read_temp("matsue")

##### データの結合

# データの複製
df = df_kw.copy()

# 広島の気温を結合
df["TMP_hiroshima"] = df_tmp_hiroshima.TEMP
# 松江の気温を結合
df["TMP_matsue"] = df_tmp_matsue.TEMP

データ加工

# 月、週、時間の値を個別に取得
df["MONTH"] = df.index.month
df["DAY"] = df.index.day
df["WEEK"] = df.index.weekday
df["HOUR"] = df.index.hour

cols = ['MW', 'TMP_hiroshima', 'TMP_matsue', 'MONTH', 'DAY', 'WEEK', 'HOUR']
df = df[cols]

df = df.dropna()

# One-hotエンコーディング

cols = ["MONTH","WEEK","HOUR"]
for col in cols:
    df = df.join(pd.get_dummies(df[col], prefix=col))

# 学習用と検証用に分割

df_valid = df["2020-12-01":]
df = df[:"2020-11-30"]

# 学習に使用する変数の列名を指定

x_cols = ['TMP_hiroshima', 'TMP_matsue', 'DAY']

for i in range(12):
    x_cols.append("MONTH_{}".format(i+1))

for i in range(7):
    x_cols.append("WEEK_{}".format(i))

for i in range(24):
    x_cols.append("HOUR_{}".format(i))

学習

import h2o
from h2o.automl import H2OAutoML

h2o.init()

kw = h2o.H2OFrame(df[x_cols + ["MW"]])

predictors = kw.columns[:-1]
response = "MW"

train, test = kw.split_frame(ratios = [0.8], seed = 1234)

aml = H2OAutoML(max_runtime_secs = 60)
aml.train(x = predictors, y = response,
          training_frame = train,
          leaderboard_frame = test)

lb = aml.leaderboard
print(lb)

実行結果

AutoML progress: |████████████████████████████████████████████████████████| 100%
model_id	mean_residual_deviance	rmse	mse	mae	rmsle
StackedEnsemble_AllModels_AutoML_20210130_220026	55238.9	235.029	55238.9	179.543	0.0360048
StackedEnsemble_BestOfFamily_AutoML_20210130_220026	55576.7	235.747	55576.7	180	0.0361164
XGBoost_grid__1_AutoML_20210130_220026_model_1	55711.4	236.033	55711.4	180.202	0.0361703
XGBoost_2_AutoML_20210130_220026	78478.5	280.14	78478.5	216.709	0.0425244
GBM_grid__1_AutoML_20210130_220026_model_1	81868.2	286.126	81868.2	219.097	0.0433825
XGBoost_1_AutoML_20210130_220026	117797	343.216	117797	263.161	0.0515385
XGBoost_3_AutoML_20210130_220026	134151	366.266	134151	278.607	0.0557366
DRF_1_AutoML_20210130_220026	140472	374.796	140472	287.559	0.0551513
DeepLearning_grid__1_AutoML_20210130_220026_model_1	168062	409.953	168062	295.05	0.0587799
GBM_grid__1_AutoML_20210130_220026_model_2	222553	471.755	222553	368.56	0.0710886
Out[34]:

評価

from sklearn.metrics import r2_score

y_test = test.as_data_frame().MW
y_pred = aml.predict(test).as_data_frame().predict

print(r2_score(y_test, y_pred))

実行結果

stackedensemble prediction progress: |████████████████████████████████████| 100%
0.9570278780866834

検証

kw_valid = h2o.H2OFrame(df_valid[x_cols])
y_pred = aml.predict(kw_valid).as_data_frame().predict
y_valid = df_valid["MW"]

print(r2_score(y_valid, y_pred))

実行結果

Parse progress: |█████████████████████████████████████████████████████████| 100%
stackedensemble prediction progress: |████████████████████████████████████| 100%
0.759670901243868

ま、こんなものかな。

予測

以下のAPIを利用して明後日の電気使用量を予測します。

気象予報値などをAPIで取得できるサービスのテスト版を公開してみた

import requests
import io

headers = {'Authorization': 'Token [自分用のトークンを使用]'}
url = "https://iot.blueomega.jp/spnext/api/v1/weather/forecast.csv"

df_pred = pd.DataFrame()
target_date = ""

for city in ["Hiroshima", "Matsue"]:
    
    payload = {
        "point": city
    }
    if len(target_date) > 0:
        payload["date"] = target_date
        
    rs = requests.get(url, headers=headers, params=payload)
    df = pd.read_csv(io.StringIO(rs.content.decode('utf-8')), header=0)
    
    df["flg"] = True
    for i in range(len(df) - 1):
        if df.iloc[i]["forecast_time"] == df.iloc[i+1]["forecast_time"]:
            df.loc[i+1, "flg"] = False
    df = df[df.flg == True]
    
    df.index = pd.to_datetime(df.forecast_time)
    
    df_pred["TMP_{}".format(city.lower())] = df["temperature"]
    
df_pred["MONTH"] = df_pred.index.month
df_pred["DAY"] = df_pred.index.day
df_pred["WEEK"] = df_pred.index.weekday
df_pred["HOUR"] = df_pred.index.hour

cols = ["MONTH","WEEK","HOUR"]
for col in cols:
    df_pred = df_pred.join(pd.get_dummies(df_pred[col], prefix=col))

for col in x_cols:
    if col not in df_pred.columns:
        df_pred[col] = 0.0

kw_pred = h2o.H2OFrame(df_pred[x_cols])

df_pred["MW"] = aml.predict(kw_pred).as_data_frame().predict.tolist()
df_pred.MW.plot(figsize=(15,5))

できた！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up