More than 3 years have passed since last update.

LSTMを用いたbitcoinのチャート予測

Last updated at 2021-03-30Posted at 2021-03-30

はじめに##

この記事ではpython初学者がkaggleのbitcoinのデータセットを使い過去２４時間のデータから次の１時間のチャートを予測するまでの過程をまとめました。
自分なりの理解で学習を進めているので間違っている点あればご指摘ください。

参考文献##

LSTMの実装記事１
 LSTMの実装記事2
LSTMの実装記事3
LSTMとは?RNNとは？

データ取得・加工##

まずデータの取得ですが、kaggleのデータセットとpoloniexのapiを使って取得する２パターンがありました。poloniexではあまりデータ量が取れなかったので(jupyternotebookでの実装だったから？）今回はkaggleのデータセットを使いました。

import pandas as pd
import numpy as np
import datetime
import time
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import LSTM

def get_chart():
    
    df = pd.read_csv("/Users/nishiyamaippei/Downloads/bitstampUSD_1-min_data_2012-01-01_to_2020-12-31.csv")
    #データに使うTimestamp,Close以外を削除
    df.drop(['Open','Volume_(BTC)','Volume_(Currency)', 'Weighted_Price','High','Low'],axis=1, inplace=True)
     #unix時間から日本時間に変更
    df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s', utc=True).dt.tz_convert('Asia/Tokyo')
    #インデックスにTimestampをセット
    df = df.set_index('Timestamp')
    #dfを1分区切りから1時間区切りに
    df = df.resample('1H').mean()
    df = df.dropna()
#     print(df)
    return df

df = get_chart()

今回はdfのClose（終値）を使って実装したためほかの項目は削除してます。
最初はdf.groupbyを使いTimestampでグループ化して実装していたのですがうまく学習データを作ることができなかったのでdf.set_indexに変更しました。

学習データの作成##

学習データを作成するにあたってscikit-learnのtrain_test_split()関数を使おうと思っていましたが自作で作ったことがなかったので自分で関数を作ることにしました。

def df_to_list(df):
    #データを格納するリストを作成
    x, y = [], []
    dl = len(df)
    hl = 24
    #for文で取り出す
    for i in range(dl - hl):
        #24時間のデータ
        x.append(df.iloc[i : i+hl])
        y.append(df.iloc[i+hl])

    x = np.array(x)
    y = np.array(y)
    
    row = int(round(0.8 * dl))#データの区切る場所をrowに
    
    x_train, x_test = x[: row], x[row :]
    y_train, y_test = y[: row], y[row :]
    
    return x_train, x_test, y_train, y_test

x_train, x_test, y_train, y_test = df_to_list(df)

標準化##

mean = np.mean(x_train)           #平均値の保存
std = np.std(x_train)             #標準偏差の保存
x_train = (x_train - mean) / std  
x_test = (x_test - mean) / std

ymean = np.mean(y_train)
ystd = np.std(y_train)
y_train = (y_train - ymean) / ystd
y_test = (y_test - ymean) / ystd

モデルの構築・学習##

# モデルの構築
def model_build(neurons, output_size, activation="linear", dropout=0.4, loss="mean_squared_error", optimizer="rmsprop", metrics=['mse']):
    model = keras.Sequential()
    model.add(LSTM(neurons, return_sequences=True, input_shape=(None, 1)))
    model.add(Dropout(dropout))
    model.add(LSTM(neurons, return_sequences=False))
    model.add(Dense(units=output_size, activation=activation))
    model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
    return model

# 学習
model = model_build(neurons=50, output_size=1)
model_history = model.fit(x_train, y_train, epochs=10, batch_size=30, validation_data=(x_test, y_test))
model.summary()

回帰予測を行うので活性化関数はlinearを損失関数にMSEを指定し学習させました。

学習の経過はこんな感じです。
lossがしっかり下がってきているのでちゃんと学習できていると思います。
vallossに関しては最初から０に近い値なのが気になりますが、

epochとlossの可視化

import matplotlib.pyplot as plt

fig, ax1 = plt.subplots(1, 1)

ax1.plot(model_history.epoch, model_history.history['loss'])
ax1.set_title('trainning-data')
ax1.set_xlabel('epochs')
ax1.set_ylabel('loss')
plt.show()

予測・可視化##

pred = model.predict(x_test)

pred = pred * std + mean　#標準化→元の数値に
y_test = y_test * ystd + ymean

pred = np.array(pred)

元データと予測値の可視化

plt.figure(figsize=(15.0, 6.0))
plt.plot(pred, color='b', label='predict')
plt.plot(y_test, color='y', label='real', linewidth=1)
plt.xlabel('minute')
plt.ylabel('price')
plt.grid(True)
plt.legend()
plt.show()

遠目だと正解データとほとんど同じような動きをしてますね。

拡大したものを見ると正解データを一コマ遅れでプロットしたみたいになっています。

まとめ##

今回のチャート予測では扱ったことのないLSTMの実装法や、kaggleのデータに触れるきっかけができたのでこれからも定期的に学習していきたいです。
今後は可視化の手法としてmatplotlib以外のseabornやplotlyなども扱っていいけたらなと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up