Help us understand the problem. What is going on with this article?

GoogleColaboratoryで、Seleniumで株式チャートをスクレイピングし、CNNでN日後の株価予測する

More than 1 year has passed since last update.

GoogleColaboratoryで、Seleniumで株式チャートをスクレイピングし、CNNでN日後の株価予測する

概要

チャートを見て株取引している人がいるので、チャート画像から株価が予測できるのでは?って思ったのでやってみます。
結論から言うと、5日後の予測で60%前後の正解率になり、予測できたとは言えませんでした。。。

でも、「Seleniumを使て画像をスクレイピングする」ってことができたので、よかったかなと思います。

  • 環境はGoogleColaboratoryを利用します。
  • チャート画像、翌日株価(CSV)のスクレイピングには、Seleniumを利用します。
  • WebDriverは、PhantomJSを利用します。(サポート廃止っぽいが動きました。。。自己責任でお願いします)
  • 株式チャート画像から、N日後の株価を2クラス(上がったか、下がったか)で分類し予測します。
  • 銘柄は日経平均です。
  • フレームワークはChainerです。

Seleniumを使って、株式チャートの画像を取得する

Seleniumをインストールする

Seleniumをインストールする

!pip install selenium

WebDriver(PhantomJS)をダウンロードする

webdriverについて

google、firefoxは、なんかだめだったので、
警告がでますが「PhantomJS」を使用します。

%%bash
mkdir ~/src 
cd ~/src
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
%%bash
ls ~/src
phantomjs-2.1.1-linux-x86_64.tar.bz2

PhantomJSを解凍し、パスを通す

よくわかってないですが解凍した「phantomjs」を「/usr/local/bin/ 」に移すことで、パスが通るみたいです。

%%bash
cd ~/src
tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2
cd phantomjs-2.1.1-linux-x86_64/bin/
mv phantomjs /usr/local/bin/

パスが通った確認する。

phantomjs> って表示されれば、OKです。

%%bash
phantomjs
phantomjs> 

seleniumを使って、Googleのページをとってみる。

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.set_headless(Options.headless)

browser = webdriver.PhantomJS()

browser.implicitly_wait(3)
browser.get('https://www.google.com')
browser.save_screenshot("google.com.png")
browser.quit()
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
from PIL import Image

Image.open('google.com.png')

output_19_0.png

seleniumを使って、株式チャート画像を取得します。

参考にしたサイト

日経平均のCSVデータを取得する

import requests
from selenium import webdriver

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)

nikkei225_url = 'https://finance.yahoo.com/quote/%5EN225/history?period1=1356966000&period2=1528642800&interval=1d&filter=history&frequency=1d'
browser.get(nikkei225_url)

download_link = browser.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[2]/span[2]/a').get_attribute('href')

session = requests.Session()
cookies = browser.get_cookies()

for cookie in cookies:
  session.cookies.set(cookie['name'], cookie['value'])
response = session.get(download_link)

with open('nikkei225.csv', 'wb') as f:
  f.write(response.content)

browser.quit()
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

日経平均のCSVデータが取得できたか確認する

import pandas as pd
df = pd.read_csv("nikkei225.csv")
df.tail()
Date Open High Low Close Adj Close Volume
1342 2018-06-05 22552.169922 22602.130859 22470.039063 22539.539063 22539.539063 65800.0
1343 2018-06-06 22520.310547 22662.820313 22498.589844 22625.730469 22625.730469 67900.0
1344 2018-06-07 22748.720703 22856.369141 22732.179688 22823.259766 22823.259766 72200.0
1345 2018-06-08 22799.380859 22879.000000 22694.500000 22694.500000 22694.500000 85200.0
1346 2018-06-11 22686.949219 22856.080078 22667.300781 22804.039063 22804.039063 55700.0
df.describe()
Open High Low Close Adj Close Volume
count 1333.000000 1333.000000 1333.000000 1333.000000 1333.000000 1333.000000
mean 17504.436762 17603.798866 17399.636172 17503.307298 17503.307298 145364.516129
std 2951.455511 2945.909275 2953.304942 2951.124517 2951.124517 60485.992544
min 10405.669922 10602.120117 10398.610352 10486.990234 10486.990234 0.000000
25% 15204.309570 15326.780273 15124.360352 15224.110352 15224.110352 109000.000000
50% 17306.640625 17400.769531 17162.210938 17290.490234 17290.490234 135300.000000
75% 19737.210938 19840.000000 19679.369141 19746.199219 19746.199219 167900.000000
max 24078.929688 24129.339844 23917.140625 24124.150391 24124.150391 595200.000000

前日終値と当日終値の差を「NextDayClose」とする

import numpy as np
#Close(終値)が無い行を削除する
df = df.dropna(subset=['Close'])
# #翌日終値カラムを追加する
# df['NextDayClose'] = df['Close'].shift(-1).fillna(0)
# #翌日アップ金額を追加する
# df['NextDayUpPrice'] = (df['NextDayClose'] - df['Close']).fillna(0)
# #翌日アップ率を追加する
# df['NextDayUpLate'] = (df['NextDayUpPrice'] / df['Close'] * 100).fillna(0)

past_n_day = -5
#N日後の終値カラムを追加する
df['N_DaysLaterClose'] = df['Close'].shift(past_n_day).fillna(0)
#N日後のアップ金額を追加する
df['N_DaysLaterUpPrice'] = (df['N_DaysLaterClose'] - df['Close']).fillna(0)
#N日後のアップ率を追加する
df['N_DaysLaterUpLate'] = (df['N_DaysLaterUpPrice'] / df['Close'] * 100).fillna(0)
df.head()
Date Open High Low Close Adj Close Volume N_DaysLaterClose N_DaysLaterUpPrice N_DaysLaterUpLate
0 2013-01-04 10604.500000 10734.230469 10602.240234 10688.110352 10688.110352 219000.0 10801.570313 113.459961 1.061553
1 2013-01-07 10743.690430 10743.690430 10589.700195 10599.009766 10599.009766 187700.0 10879.080078 280.070312 2.642420
2 2013-01-08 10544.209961 10602.120117 10463.429688 10508.059570 10508.059570 211400.0 10600.440430 92.380860 0.879143
3 2013-01-09 10405.669922 10620.700195 10398.610352 10578.570313 10578.570313 215000.0 10609.639648 31.069335 0.293701
4 2013-01-10 10635.110352 10686.120117 10619.650391 10652.639648 10652.639648 268500.0 10913.299805 260.660157 2.446907

チャート作成対象のリストを作る

target_list = df.where(df['Date'] <= '2018-03-01').dropna()
target_list.tail()
Date Open High Low Close Adj Close Volume N_DaysLaterClose N_DaysLaterUpPrice N_DaysLaterUpLate
1270 2018-02-23 21789.720703 21903.390625 21741.630859 21892.779297 21892.779297 64600.0 21181.640625 -711.138672 -3.248280
1271 2018-02-26 22134.640625 22226.529297 22040.869141 22153.630859 22153.630859 64600.0 21042.089844 -1111.541015 -5.017421
1272 2018-02-27 22391.669922 22502.050781 22325.070313 22389.859375 22389.859375 74300.0 21417.759766 -972.099609 -4.341696
1273 2018-02-28 22292.529297 22380.279297 22068.240234 22068.240234 22068.240234 88800.0 21252.720703 -815.519531 -3.695444
1274 2018-03-01 21901.130859 21901.130859 21645.220703 21724.470703 21724.470703 90300.0 21368.070313 -356.400390 -1.640548

株式チャートを取得するメソッドの定義

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.firefox.options import Options
import base64
from time import sleep

画像ファイルパス、ファイル名を作成するメソッドの定義

def create_dir_and_file_name(code, year=1, month=1, day=1):
  file_name = "{}_{}_{}.png".format(year, month, day)
  chart_file_path = "./stock_chart/{}/".format(code)
  return chart_file_path, file_name

年、月、日は、ドロップダウンを回して、動的にやりたい気もする。。。

YEAR_DIC = {2018:1, 2017:2, 2016:3, 2015:4, 2014:3, 2013:2}
def save_chart_by_selenium(browser, code, year=1, month=1, day=1):

  #日経のチャートへ接続する
  browser.get('https://www.nikkei.com/markets/chart/#!/' + code)

  #チャートの期間を3か月にする
  browser.find_element_by_id("ViewTerm3m").click()

  #2018年 1月 1日にする
  #期間指定のポップアップ表示
  browser.find_element_by_id("TermConfBtn").click()

  #ポップアップを取得
  popup = browser.find_element_by_class_name("popup")
  #Toの期間指定をクリック
  popup.find_element_by_css_selector("p.EnDateSelect.daybox_2.font14").click()

  #年のドロップダウンをクリック
  browser.find_element_by_xpath("/html/body/div[3]/div/div/div/div[2]/div[1]/div/a").click()
  #年を指定する
  browser.find_element_by_xpath('/html/body/div[3]/div/div/div/div[2]/div[1]/div/ul/li[{}]'.format(YEAR_DIC[year])).click()

  #月のドロップダウンをクリック
  browser.find_element_by_xpath("/html/body/div[3]/div/div/div/div[2]/div[2]/div/a").click()
  #月を指定する
  browser.find_element_by_xpath('/html/body/div[3]/div/div/div/div[2]/div[2]/div/ul/li[{}]'.format(month)).click()

  #日のドロップダウンをクリック
  browser.find_element_by_xpath("/html/body/div[3]/div/div/div/div[2]/div[3]/div/a").click()
  #日を指定する
  browser.find_element_by_xpath('/html/body/div[3]/div/div/div/div[2]/div[3]/div/ul/li[{}]'.format(day)).click()

  #設定ボタンクリック
  popup.find_element_by_css_selector("p.SetTermConf.btn01.font14").click()
  #スリープしないと、チャートが変わらないっぽいので、、、3秒スリープでなんとかいけた
  sleep(3) 

  #canvas→pngに変換する
  chart_canvas = browser.find_element_by_css_selector("#ChartGraphTop")
  chart_canvas_base64 = browser.execute_script("return arguments[0].toDataURL('image/png').substring(21);", chart_canvas)
  chart_canvas_png = base64.b64decode(chart_canvas_base64)

  #保存先
  chart_file_path, chart_file_name = create_dir_and_file_name(code, year, month, day)

  os.makedirs(chart_file_path, exist_ok=True)

  with open(chart_file_path + chart_file_name, 'wb') as f:
    f.write(chart_canvas_png)

  return chart_file_path + chart_file_name
  #browser.save_screenshot("test.png")

確認する

options = Options()
options.set_headless(Options.headless)

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)

save_chart_by_selenium(browser, "0101", 2018, 8, 13)

Image.open('./stock_chart/0101/2018_8_13.png')
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

output_41_1.png

GoogleDriveの認証をしておく

結構時間のかかる処理のため、処理終了後に、GoogleDriveに保存しときたいと思います。

保存するために、認証を行っておきます。

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

チャート画像の取得

1時間以上データ取得に時間がかかります。
理由は、1300件くらいデータがあり、1件当たり3秒のスリープをしているからです。

options = Options()
options.set_headless(Options.headless)

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)

for target in target_list.values:
  year, month, day = [int(x) for x in target[0].split('-')]
  save_chart_by_selenium(browser, '0101', year, month, day)

browser.quit()
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

パスとラベルのリストを作る

上がったか、下がったかで分類します。

0:0未満

1:0以上

def get_label_2(val):
  if val < 0:
    return 0
  else:
    return 1
path_and_label = []
for target in target_list.values:
  year, month, day = [int(x) for x in target[0].split('-')]
  path, file_name = create_dir_and_file_name('0101', year, month, day)
  label = get_label_2(target[-1])
  path_and_label.append((path + file_name, label))
path_and_label[:10]
[('./stock_chart/0101/2013_1_4.png', 1),
 ('./stock_chart/0101/2013_1_7.png', 1),
 ('./stock_chart/0101/2013_1_8.png', 1),
 ('./stock_chart/0101/2013_1_9.png', 1),
 ('./stock_chart/0101/2013_1_10.png', 1),
 ('./stock_chart/0101/2013_1_11.png', 0),
 ('./stock_chart/0101/2013_1_15.png', 0),
 ('./stock_chart/0101/2013_1_16.png', 0),
 ('./stock_chart/0101/2013_1_17.png', 1),
 ('./stock_chart/0101/2013_1_18.png', 1)]

作成データの圧縮、タブ区切り化

画像データは、ZIPで圧縮します。

import shutil

shutil.make_archive('./stock_chart', 'zip', root_dir='./stock_chart')
'/content/stock_chart.zip'

path_and_labelは、スペース区切りのファイルとして、出力します。

import csv

with open('stock_chart_path_label.txt', 'w') as f:
  writer = csv.writer(f, lineterminator='\n', delimiter=' ')
  writer.writerows(path_and_label)
df = pd.read_csv('stock_chart_path_label.txt')
df.tail(10)
./stock_chart/0101/2013_1_4.png 1
1254 ./stock_chart/0101/2018_2_16.png 1
1255 ./stock_chart/0101/2018_2_19.png 1
1256 ./stock_chart/0101/2018_2_20.png 1
1257 ./stock_chart/0101/2018_2_21.png 1
1258 ./stock_chart/0101/2018_2_22.png 0
1259 ./stock_chart/0101/2018_2_23.png 0
1260 ./stock_chart/0101/2018_2_26.png 0
1261 ./stock_chart/0101/2018_2_27.png 0
1262 ./stock_chart/0101/2018_2_28.png 0
1263 ./stock_chart/0101/2018_3_1.png 0

GoogleDriveに保存する

save_files = ["stock_chart_path_label.txt", "stock_chart.zip", "nikkei225.csv"]
for save_file in save_files:
  upload_file = drive.CreateFile()
  upload_file.SetContentFile(save_file)
  upload_file.Upload()

取得画像の確認

from PIL import Image

img = Image.open('stock_chart/0101/2017_3_1.png').convert('RGB')
img

output_65_0.png

デバッグ用

画像の取得がうまくいかないって時に、ブラウザが今どんな状態か見るために、
browser.save_screenshot("test.png")で出力した画像を見てました。

# from PIL import Image
# Image.open('test.png')

Chainer(CNN)で、チャート画像から翌日株価を予測する

Chainerのインストール

!pip uninstall chainer -y
!pip uninstall cupy-cuda80 -y
!pip uninstall chainercv -y
!apt -y install libcusparse8.0 libnvrtc8.0 libnvtoolsext1
!ln -snf /usr/lib/x86_64-linux-gnu/libnvrtc-builtins.so.8.0 /usr/lib/x86_64-linux-gnu/libnvrtc-builtins.so
!pip --no-cache-dir install 'chainer==4.0.0b4' 'cupy-cuda80==4.0.0b4' 
!pip install chainer -U
!pip install cupy-cuda80 -U
!pip --no-cache-dir install chainercv
import chainer
import cupy

chainer.print_runtime_info()
print('GPU availability:', chainer.cuda.available)
print('cuDNN availablility:', chainer.cuda.cudnn_enabled)
Chainer: 4.4.0
NumPy: 1.14.5
CuPy:
  CuPy Version          : 4.4.1
  CUDA Root             : None
  CUDA Build Version    : 8000
  CUDA Driver Version   : 9000
  CUDA Runtime Version  : 8000
  cuDNN Build Version   : 7102
  cuDNN Version         : 7102
  NCCL Build Version    : 2213
GPU availability: True
cuDNN availablility: True

GoogleDriveよりデータ取得する

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
id = '******************'
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('stock_chart_path_label.txt')
import pandas as pd
df = pd.read_table('stock_chart_path_label.txt')
df.tail(10)
id = '******************'
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('stock_chart.zip')
%%bash
unzip stock_chart.zip -d stock_chart
ls
from PIL import Image
img = Image.open('stock_chart/0101/2013_2_27.png')
print("shape is ", np.array(img).shape)
img
shape is  (250, 611, 4)

output_80_1.png

データを、Chainerで利用可能な形に成型する

from chainercv.transforms import scale
from chainercv.transforms import resize
from PIL import Image
# 各データに行う変換
def transform(inputs):
  img , label = inputs

  #RGBにする
  img = img.astype('u1')
  img = Image.fromarray(img.transpose(1, 2, 0))
  img = img.convert('RGB')

  img = np.array(img, dtype='f').transpose(2, 0, 1)
  #サイズ変換
  img = resize(img, (224, 224))
  #print(img.shape)
  #スケーリング(データを0~1の間にする)
  img = img.astype(np.float32) / 255

  return img, label
import numpy as np
from chainer.datasets import LabeledImageDataset, TransformDataset

d = LabeledImageDataset("stock_chart_path_label.txt")

td = TransformDataset(d, transform)
from chainer import datasets

train, valid = datasets.split_dataset_random(td, int(len(d) * 0.8), seed=0)

シードの固定

import random
import numpy
import chainer

def reset_seed(seed=0):
    random.seed(seed)
    numpy.random.seed(seed)
    if chainer.cuda.available:
        chainer.cuda.cupy.random.seed(seed)

reset_seed(0)

学習

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import training,serializers,Chain,datasets,sequential,optimizers,iterators
from chainer.training import extensions,Trainer
from chainer.dataset import concat_examples

from chainercv.links import VGG16
from chainercv.links import ResNet152,ResNet50,ResNet101
from chainer.links import Classifier
import numpy as np
batchsize = 32
max_epoch = 60
gpu_id = 0
e_shift = 20
e_shift_lr = 0.1

VGG16

class FinetuneNet(chainer.Chain):
  def __init__(self, extractor, n_units, n_class):
    super().__init__()
    w = chainer.initializers.HeNormal()
    with self.init_scope():
      self.extractor = extractor
      self.l1 = L.Linear(None, n_units, initialW=w)
      self.l2 = L.Linear(None, n_units, initialW=w)
      self.l3 = L.Linear(None, n_class, initialW=w)

  def __call__(self, x):
    h = x
    h = self.extractor(h)
    h.unchain()
    h = F.relu(self.l1(h))
    h = F.dropout(h)
    h = F.relu(self.l2(h))
    h = F.dropout(h)
    h = self.l3(h)
    return h
# extractor = ResNet152(pretrained_model='imagenet', arch='he')
extractor = VGG16(pretrained_model='imagenet')
extractor.pick = 'pool5'
model = Classifier(FinetuneNet(extractor, 4098, 2))
model.to_gpu(gpu_id)
Downloading ...
From: https://chainercv-models.preferred.jp/vgg16_imagenet_converted_2017_07_18.npz
To: /root/.chainer/dataset/_dl_cache/4f02fe8a8af16cdedf8200a6701dcb0a
  %   Total    Recv       Speed  Time left
 94  490MiB  462MiB   4610KiB/s    0:00:06




<chainer.links.model.classifier.Classifier at 0x7f89bd5330f0>
train_iter = iterators.MultiprocessIterator(train, batchsize)
valid_iter = iterators.MultiprocessIterator(valid, batchsize, False, False)

optimaizer = optimizers.MomentumSGD(lr=0.001).setup(model)
optimaizer.add_hook(chainer.optimizer.WeightDecay(0.0001))

updater = training.StandardUpdater(train_iter, optimaizer, device=gpu_id)
trainer = Trainer(updater, stop_trigger=(max_epoch, 'epoch'))

chainer.cuda.set_max_workspace_size(1024 * 1024 * 1024)
chainer.global_config.autotune = True
chainer.global_config.type_check = False

trainer.extend(extensions.observe_lr())
trainer.extend(extensions.LogReport())
trainer.extend(extensions.Evaluator(valid_iter, model, device=gpu_id), name='val')
trainer.extend(extensions.ExponentialShift('lr', e_shift_lr), trigger=(e_shift, 'epoch'))
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'val/main/loss', 'val/main/accuracy', 'lr', 'elapsed_time']))
trainer.extend(extensions.snapshot(filename='snapshot_epoch-{.updater.epoch}'))
trainer.extend(extensions.PlotReport(['main/loss', 'val/main/loss'], x_key='epoch', file_name='loss.png'))
trainer.extend(extensions.PlotReport(['main/accuracy', 'val/main/accuracy'], x_key='epoch', file_name='accuracy.png'))
trainer.extend(extensions.ProgressBar())
trainer.extend(extensions.dump_graph('main/loss'))
trainer.run()
epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  lr          elapsed_time
[J1           1.25676     0.553711       0.970616       0.436826           0.001       31.1777       
[J2           1.43051     0.535156       0.960703       0.574892           0.001       157.184       
[J3           1.54337     0.530242       1.12323        0.425108           0.001       280.327       
[J     total [##................................................]  5.27%
this epoch [########..........................................] 16.21%
       100 iter, 3 epoch / 60 epochs
       inf iters/sec. Estimated time to finish: 0:00:00.
[4A[J4           0.970235    0.524414       0.679433       0.574892           0.001       404.73        
[J5           0.761617    0.508789       0.702564       0.497845           0.001       529.202       
[J6           0.717731    0.541331       0.673392       0.594423           0.001       654.141       
[J     total [#####.............................................] 10.54%
this epoch [################..................................] 32.41%
       200 iter, 6 epoch / 60 epochs
   0.26594 iters/sec. Estimated time to finish: 1:46:23.007901.
[4A[J7           0.69833     0.587891       0.668774       0.582705           0.001       778.987       
[J8           0.710047    0.52621        0.684186       0.522495           0.001       904.168       
[J9           0.68308     0.56543        0.674996       0.610048           0.001       1029.01       
[J     total [#######...........................................] 15.81%
this epoch [########################..........................] 48.62%
       300 iter, 9 epoch / 60 epochs
   0.26571 iters/sec. Estimated time to finish: 1:40:12.243024.
[4A[J10          0.683418    0.557617       0.667301       0.594828           0.001       1153.93       
[J11          0.688133    0.5625         0.666684       0.582705           0.001       1279.7        
[J12          0.680461    0.569336       0.677656       0.554957           0.001       1405.49       
[J     total [##########........................................] 21.08%
this epoch [################################..................] 64.82%
       400 iter, 12 epoch / 60 epochs
   0.26494 iters/sec. Estimated time to finish: 1:34:12.307453.
[4A[J13          0.669613    0.587891       0.672255       0.60264            0.001       1531.83       
[J14          0.670181    0.608871       0.666808       0.578798           0.001       1656.16       
[J15          0.67315     0.602539       0.672514       0.574892           0.001       1783.35       
[J     total [#############.....................................] 26.35%
this epoch [########################################..........] 81.03%
       500 iter, 15 epoch / 60 epochs
   0.26424 iters/sec. Estimated time to finish: 1:28:08.803435.
[4A[J16          0.674236    0.581653       0.666224       0.590113           0.001       1911.16       
[J17          0.669897    0.588867       0.669224       0.606546           0.001       2038.06       
[J18          0.662643    0.604492       0.667773       0.578798           0.001       2164.56       
[J     total [###############...................................] 31.62%
this epoch [################################################..] 97.23%
       600 iter, 18 epoch / 60 epochs
   0.26395 iters/sec. Estimated time to finish: 1:21:55.787767.
[4A[J19          0.65989     0.612903       0.66888        0.613147           0.001       2289.94       
[J20          0.67777     0.581055       0.681438       0.561557           0.001       2417.66       
[J21          0.671243    0.575195       0.670354       0.574892           0.0001      2544.12       
[J22          0.665133    0.595766       0.668622       0.590517           0.0001      2669.51       
[J     total [##################................................] 36.89%
this epoch [######............................................] 13.44%
       700 iter, 22 epoch / 60 epochs
   0.25115 iters/sec. Estimated time to finish: 1:19:28.092030.
[4A[J23          0.660549    0.595703       0.667946       0.610857           0.0001      2794.1        
[J24          0.658184    0.602823       0.667611       0.614763           0.0001      2919.19       
[J25          0.656694    0.603516       0.667329       0.614359           0.0001      3045.04       
[J     total [#####################.............................] 42.16%
this epoch [##############....................................] 29.64%
       800 iter, 25 epoch / 60 epochs
   0.25302 iters/sec. Estimated time to finish: 1:12:17.639967.
[4A[J26          0.653186    0.617188       0.666626       0.606546           0.0001      3169.9        
[J27          0.650239    0.611895       0.665682       0.610453           0.0001      3296.03       
[J28          0.652033    0.625          0.666332       0.618265           0.0001      3420.43       
[J     total [#######################...........................] 47.43%
this epoch [######################............................] 45.85%
       900 iter, 28 epoch / 60 epochs
   0.25437 iters/sec. Estimated time to finish: 1:05:21.450960.
[4A[J29          0.650613    0.597656       0.665503       0.614359           0.0001      3546.59       
[J30          0.65221     0.61996        0.664578       0.618265           0.0001      3671.57       
[J31          0.64734     0.623047       0.663116       0.613955           0.0001      3799.96       
[J     total [##########################........................] 52.70%
this epoch [###############################...................] 62.06%
      1000 iter, 31 epoch / 60 epochs
   0.25525 iters/sec. Estimated time to finish: 0:58:36.092228.
[4A[J32          0.649706    0.618952       0.662693       0.626078           0.0001      3925.36       
[J33          0.640512    0.62207        0.661959       0.626078           0.0001      4050.78       
[J34          0.645955    0.621094       0.660486       0.622171           0.0001      4175.56       
[J     total [############################......................] 57.97%
this epoch [#######################################...........] 78.26%
      1100 iter, 34 epoch / 60 epochs
   0.25621 iters/sec. Estimated time to finish: 0:51:52.686599.
[4A[J35          0.639895    0.622984       0.659186       0.63389            0.0001      4300.78       
[J36          0.640335    0.636719       0.659116       0.634294           0.0001      4425.67       
[J37          0.64521     0.637695       0.660035       0.622171           0.0001      4551.87       
[J     total [###############################...................] 63.24%
this epoch [###############################################...] 94.47%
      1200 iter, 37 epoch / 60 epochs
   0.25693 iters/sec. Estimated time to finish: 0:45:14.721765.
[4A[J38          0.643841    0.616935       0.658681       0.622171           0.0001      4677.25       
[J39          0.648018    0.615234       0.659794       0.633082           0.0001      4803.14       
[J40          0.640266    0.65121        0.659143       0.634294           0.0001      4928.76       
[J41          0.640533    0.619141       0.658272       0.6382             1e-05       5054.1        
[J     total [##################################................] 68.51%
this epoch [#####.............................................] 10.67%
      1300 iter, 41 epoch / 60 epochs
    0.2513 iters/sec. Estimated time to finish: 0:39:37.606312.
[4A[J42          0.642407    0.635742       0.658007       0.634294           1e-05       5180.53       
[J43          0.638421    0.637097       0.657881       0.6382             1e-05       5306.34       
[J44          0.638355    0.615234       0.657795       0.634294           1e-05       5432.05       
[J     total [####################################..............] 73.78%
this epoch [#############.....................................] 26.88%
      1400 iter, 44 epoch / 60 epochs
   0.25226 iters/sec. Estimated time to finish: 0:32:52.201271.
[4A[J45          0.646939    0.647461       0.657879       0.63389            1e-05       5557.13       
[J46          0.633851    0.638105       0.657871       0.63389            1e-05       5681.52       
[J47          0.634775    0.65625        0.65795        0.62958            1e-05       5805.18       
[J     total [#######################################...........] 79.05%
this epoch [#####################.............................] 43.08%
      1500 iter, 47 epoch / 60 epochs
   0.25325 iters/sec. Estimated time to finish: 0:26:09.585478.
[4A[J48          0.6431      0.63004        0.658035       0.62958            1e-05       5929.65       
[J49          0.639157    0.638672       0.657888       0.63389            1e-05       6056.11       
[J50          0.634541    0.646484       0.657764       0.6382             1e-05       6181.61       
[J     total [##########################################........] 84.32%
this epoch [#############################.....................] 59.29%
      1600 iter, 50 epoch / 60 epochs
   0.25397 iters/sec. Estimated time to finish: 0:19:31.382589.
[4A[J51          0.639131    0.643145       0.657604       0.63389            1e-05       6305.93       
[J52          0.63112     0.661133       0.657518       0.6382             1e-05       6431.38       
[J53          0.637556    0.638672       0.657599       0.63389            1e-05       6557.23       
[J     total [############################################......] 89.59%
this epoch [#####################################.............] 75.49%
      1700 iter, 53 epoch / 60 epochs
   0.25454 iters/sec. Estimated time to finish: 0:12:55.914910.
[4A[J54          0.644531    0.635081       0.657612       0.63389            1e-05       6683.95       
[J55          0.630982    0.661133       0.657465       0.63389            1e-05       6808.45       
[J56          0.641891    0.639113       0.657525       0.629984           1e-05       6933.97       
[J     total [###############################################...] 94.86%
this epoch [#############################################.....] 91.70%
      1800 iter, 56 epoch / 60 epochs
   0.25511 iters/sec. Estimated time to finish: 0:06:22.182383.
[4A[J57          0.637279    0.654297       0.657503       0.629984           1e-05       7060.01       
[J58          0.634336    0.637695       0.657457       0.625673           1e-05       7184.33       
[J59          0.640313    0.634073       0.657545       0.621767           1e-05       7312.79       
[J60          0.641264    0.628906       0.657413       0.63389            1e-05       7437.23       
[J

予測

予測する

def img_transform(file_path, model, show=True):
  img = Image.open(file_path)
   #RGBにする
  img = img.convert('RGB')

  if show:
    plt.imshow(img)

  img = np.array(img, dtype='f').transpose(2, 0, 1)
  #サイズ変換
  img = resize(img, (224, 224))
  #スケーリング(データを0~1の間にする)
  img = img.astype(np.float32) / 255

  # ネットワークと同じデバイス上にデータを送る
  img = model.xp.asarray(img)
  #[ミニバッチ、チャネル、高さ、幅]にする
  img = img[None, ...]
  return img
from chainer.cuda import to_cpu

def stock_predict(x, model):
  with chainer.using_config('train', False), chainer.using_config('enable_backprop', False):
      y = model.predictor(x)

  y = to_cpu(y.array)

  print(chainer.functions.softmax(y))

  y = y.argmax(axis=1)[0]

  if y == 0:
    y = '下がった'
  else:
    y = '上がった'

  print('予測ラベル:', y)
from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.set_headless(Options.headless)

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)

save_chart_by_selenium(browser, "0101", 2018, 7, 7)
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '





'./stock_chart/0101/2018_7_7.png'
img = img_transform('./stock_chart/0101/2018_7_7.png', model);

output_118_0.png

stock_predict(img, model)
variable([[0.44009668 0.5599033 ]])
予測ラベル: 上がった

まとめ

以上で、終わりです。

毎日、全銘柄のチャート画像取得して、上がりそうなやつTOP10とかできないかなって思います。
ほんで、それに対して、ツイッターのデータを取得して、GoogleのAPIで感情分析とかしたりして。。。
ってなかんじで、正解率を上げたりできないかなーって思います。

それと、スクレイピングではなく、自分でチャート画像を作ったほうが自由にできそうなので、
次は、そんな感じのことをやろうかなと思います。

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away