More than 5 years have passed since last update.

SBI証券ポートフォリオページをスクレイピングする

Last updated at 2019-12-30Posted at 2019-12-29

はじめに

myTrade を愛用していたのですが、1/9 にサポート終了ということで、いろいろなアプリを探したのですが、適したアプリがなく、自分でSBIのポートフォリオのページをスクレイピングして、Google spread sheet でデータを管理することにしました。
したがってこのページでは以下の2つのプログラムを紹介します。

SBI 証券のポートフォリオのページをスクレイピング
スクレイピングしたデータを Google spread sheet に書き込み
↓こんな感じに書き込みます。

環境

OS: Mac
language: python 3.7
thrid party
- selenium
- ChromeDriver
- beautiful soap
- pandas
- Google Chrome

手順

環境構築

前提：

python がインストールされている
pip がインストールされている

1.必要モジュールをインストールする

pip で必要モジュールをインストールする

pip install selenium
pip install pandas lxml html5lib BeautifulSoup4

ChromeDriver をインストールする

Google Chromeのバージョンに対応するChromeDriverをダウンロードして、PATHの通ったところに置く。 (参考: MacでPATHを通す)
ダウンロードするChromeDriverのバージョンは使っているGoogleChromeのバージョンと一致させる。
一致するものがなければ、一番近いもの。

2.Google の設定
↓のページが素晴らしいのでそのとおりにやる
https://tanuhack.com/operate-spreadsheet/#Google_Cloud_Platform

コード

1.import

import time
import datetime
import gspread
import json
import pandas
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials

2.スクレイピングする部分

class Result():
    def __init__(self, fund, amount):
        self.fund = fund
        self.amount = amount

def convert_to_list(data_frame, custody):
    data_frame = data_frame.iloc[:, [1, 10]]
    data_frame.drop([0], inplace=True)
    data_frame.columns = ['funds', 'amount']

    results = []
    row_num = data_frame.shape[0]
    for i in range(row_num):
        index = i + 1
        fund = data_frame.at[index, 'funds']
        amount = data_frame.at[index, 'amount']

        results.append(Result(custody + ':' + fund, amount))

    return results

def get_stocks():
    options = Options()
    # ヘッドレスモード(chromeを表示させないモード)
    options.add_argument('--headless')
    # ChromeのWebDriverオブジェクトを作成
    driver = webdriver.Chrome(options=options)

    # SBI証券のトップ画面を開く
    driver.get('https://www.sbisec.co.jp/ETGate')

    # ユーザーIDとパスワードをセット
    input_user_id = driver.find_element_by_name('user_id')
    input_user_id.send_keys('xxxx')

    input_user_password = driver.find_element_by_name('user_password')
    input_user_password.send_keys('yyyy')

    # ログインボタンをクリックしてログイン
    # body の読み込みは非同期っぽいので少しsleepする
    driver.find_element_by_name('ACT_login').click()
    time.sleep(5)
    driver.find_element_by_link_text('ポートフォリオ').click()

    # 文字コードをUTF-8に変換
    html = driver.page_source #.encode('utf-8')

    # BeautifulSoupでパース
    soup = BeautifulSoup(html, "html.parser")

    table = soup.find_all("table", border="0", cellspacing="1", cellpadding="4", bgcolor="#9fbf99", width="100%")
    df_stocks = pandas.read_html(str(table))[0]
    stocks = convert_to_list(df_stocks, '特定')

    df_nisa = pandas.read_html(str(table))[1]
    nisa = convert_to_list(df_nisa, 'NISA')
    
    result = []
    for s in stocks:
        result.append(s)
    
    for n in nisa:
        result.append(n)

    driver.quit()
    return result

3.spread sheet に書き込み

def write(stocks):
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']

    #認証情報設定
    #ダウンロードしたjsonファイル名をクレデンシャル変数に設定（秘密鍵、Pythonファイルから読み込みしやすい位置に置く）
    credentials = ServiceAccountCredentials.from_json_keyfile_name('zzzzz.json', scope)

    #OAuth2の資格情報を使用してGoogle APIにログインします。
    gc = gspread.authorize(credentials)

    #共有設定したスプレッドシートキーを変数[SPREADSHEET_KEY]に格納する。
    SPREADSHEET_KEY = 'hogehoge'

    #共有設定したスプレッドシートのシート1を開く
    worksheet = gc.open_by_key(SPREADSHEET_KEY).sheet1
    headers = worksheet.row_values(1)
    dates = worksheet.col_values(1)
    new_row_num = len(dates) + 1

    worksheet.update_cell(new_row_num, 1, datetime.datetime.today().strftime('%Y/%m/%d'))
    for stock in stocks:
        for i in range(len(headers)):
            if headers[i] == stock.fund:
                worksheet.update_cell(new_row_num, i + 1, stock.amount)

4.↑のページを組み合わせる

def main():
    # ポートフォリオをスクレイピングして書き込むデータを取得する
    stocks = get_stocks()
    # 取得したデータをスプレッドシートに書き込む
    write(stocks)
if __name__ == "__main__":
    main()

リスペクトページ：

SBI証券スクレイピング:
https://hato.yokohama/scraping_sbi_investment/
python + selenium:
https://qiita.com/memakura/items/20a02161fa7e18d8a693
Beautiful Soup でのスクレイピング基礎まとめ:
https://qiita.com/U-MA/items/896c49d46585e32ff7b1
pandas:
- https://shinyorke.hatenablog.com/entry/nyumon-pandas
- https://pandas.pydata.org/pandas-docs/stable/index.html
【もう迷わない】Pythonでスプレッドシートに読み書きする初期設定まとめ: https://tanuhack.com/operate-spreadsheet/#Google_Cloud_Platform
Pythonでスプレッドシートに読み書きする初期設定まとめ: https://tanuhack.com/library-gspread/
gspread official: https://gspread.readthedocs.io/en/latest/index.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

SBI証券 ポートフォリオページをスクレイピングする

はじめに

環境

手順

環境構築

コード

SBI証券ポートフォリオページをスクレイピングする