2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

自己紹介

  1. B2理系大学生
  2. 情報系
  3. webアプリ系の会社で1年
  4. pythonデータ分析未経験

動機

授業で統計学の授業で教授にkaggleをすすめられた。
始めて見たけどちんぷんかんぷんだった為挫折。代わりに投資の勉強を少ししていたのもあって、本やブログを読んで実際に価格予測ができるものを形にしてみた。

実行環境

jupyter notebook

predict.py
import pandas as pd
import warnings
from pandas.errors import PerformanceWarning
import matplotlib.pyplot as plt
import yfinance as yf
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import accuracy_score
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, Input
from keras.callbacks import EarlyStopping

# PerformanceWarningを無視
warnings.filterwarnings("ignore", category=PerformanceWarning)

# データの取得
ticker = "AMZN"
data = yf.download(ticker, start="2003-01-01", end="2023-03-15", interval="1d")
df = data
df['Date'] = pd.to_datetime(df.index)
df = df.reset_index(drop=True)
df = df.drop(columns=['Volume', 'Adj Close'])

# 日付情報の追加
df['weekday'] = df['Date'].dt.weekday
df['weeks'] = df['Date'].dt.isocalendar().week
df['year'] = df['Date'].dt.isocalendar().year
df = df[['Date', 'year', 'weeks', 'weekday', 'High', 'Low', 'Open', 'Close']]
df.sort_values(by='Date', ascending=True, inplace=True)

# 前日の終値との差とUp列を追加
df_shift = df.shift(-1)
df['delta_Close'] = df_shift['Close'] - df['Close']
df['Up'] = (df['delta_Close'] > 0).astype(int)
df = df.drop(columns=['delta_Close'])

# 前日比とBody列を追加
df_shift = df.shift(1)
df['Close_ratio'] = (df['Close'] - df_shift['Close']) / df_shift['Close']
df['Body'] = df['Open'] - df['Close']

# データ数が5の週だけを抽出
weekly_counts = df.groupby(['year', 'weeks'])['Date'].count()
valid_weeks = weekly_counts[weekly_counts == 5].index
df_filtered = df.set_index(['year', 'weeks'])
df_filtered = df_filtered.loc[valid_weeks].reset_index()
df_filtered = df_filtered[df_filtered['weekday'] != 4]  # 金曜日を除外

# 必要な列を選択
columns_to_keep = ['Date', 'weekday', 'High', 'Low', 'Open', 'Close', 'Close_ratio', 'Body', 'Up']
df_filtered = df_filtered[columns_to_keep]
df_filtered.set_index('Date', inplace=True)

# 学習データと検証データの分割
df_train = df_filtered['2003-01-01':'2020-12-31']
df_val = df_filtered['2021-01-01':]
X_train = df_train[['weekday', 'High', 'Low', 'Open', 'Close', 'Close_ratio', 'Body']]
y_train = df_train['Up']
X_val = df_val[['weekday', 'High', 'Low', 'Open', 'Close', 'Close_ratio', 'Body']]
y_val = df_val['Up']

# 標準化と変換
def std_to_np(df):
    df_list = []
    df = np.array(df)
    for i in range(0, len(df) - 3, 4):
        df_s = df[i:i+4]
        scl = StandardScaler()
        df_std = scl.fit_transform(df_s)
        df_list.append(df_std)
    return np.array(df_list)

X_train_np_array = std_to_np(X_train)
X_val_np_array = std_to_np(X_val)

# 目的変数の間引き
y_train_new = np.array(y_train[3::4])
y_val_new = np.array(y_val[3::4])

# LSTM モデルの構築とコンパイル
def lstm_comp(df):
    model = Sequential()
    model.add(Input(shape=(df.shape[1], df.shape[2])))
    model.add(LSTM(256, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model


# 交差検証
valid_scores = []
tscv = TimeSeriesSplit(n_splits=4)
early_stopping = EarlyStopping(monitor='loss', patience=3, restore_best_weights=True)

for fold, (train_indices, valid_indices) in enumerate(tscv.split(X_train_np_array)):
    X_train_fold, X_valid_fold = X_train_np_array[train_indices], X_train_np_array[valid_indices]
    y_train_fold, y_valid_fold = y_train_new[train_indices], y_train_new[valid_indices]

    model = lstm_comp(X_train_fold)
    model.fit(X_train_fold, y_train_fold, epochs=50, batch_size=64, callbacks=[early_stopping])

    y_valid_pred = model.predict(X_valid_fold).flatten()
    y_valid_pred = np.where(y_valid_pred < 0.5, 0, 1)
    score = accuracy_score(y_valid_fold, y_valid_pred)
    print(f'fold {fold} accuracy: {score}')
    valid_scores.append(score)

# スコアの出力
print(f'valid_scores: {valid_scores}')
cv_score = np.mean(valid_scores)
print(f'CV score: {cv_score}')

# LSTM構築とコンパイル関数にX_train_np_arrayを渡し、変数modelに代入
model = lstm_comp(X_train_np_array)

# モデルの学習の実行
result = model.fit(X_train_np_array, y_train_new, epochs=10, batch_size=64)

# 作成したモデルより検証データを用いて予測を行う
pred = model.predict(X_val_np_array)
pred[:10]

# 予測結果を0もしくは1に修正(0.5を境にして、1に近いほど株価が上昇、0に近いほど株価が上昇しない)
pred = np.where(pred < 0.5, 0, 1)

# 修正した予測結果の先頭10件を確認
pred[:10]

# 実際の結果から予測値の正解率を計算する
from sklearn.metrics import accuracy_score
print('accuracy = ', accuracy_score(y_true=y_val_new, y_pred=pred))
output
accuracy =  0.5319148936170213

終わりに

今回アマゾンの株価予測をLSTMを使ってやってみた。初めてなので時間がかかったが楽しく実装することができた。予想は53パーセントと精度はあまり高くないが会社のデータや経済の情勢を入れずに株価の推移だけで予想したのでこんなもんだと思う。これから改良して60%くらいまでは予測できるようにしたい。

追加

細かいところや詳しい説明は随時改稿していく予定。(今は忙しいため未定)
webアプリに落とし込むところまではやりたいが、精度を高めることを優先する。
警告が出る理由はわからなかったため無理やりなくした。

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?