More than 5 years have passed since last update.

Pythonで米国株財務三表データ取得

Last updated at 2018-03-19Posted at 2018-03-18

目的・要件

指定したTicker Symbolで特定の米国株銘柄の財務三表( Income Statement/ Balance Sheet/ Cash Flow )データを取得し、pandasのdataframeとして保持すること（ついでにcsvとexcel fileにも吐いておく）
データの取得単位についてはAnnual(年単位)、Quarterly(四半期単位)で指定可能であること
年単位のデータについては、直近TTM (Trailing Twelve Months) のデータと、可能な限りの過去履歴データを取得すること
数値（EPS等のratio系除く）は Million USD (ADRの場合はMillion各国通貨)単位

データ取得元

http://financials.morningstar.com/ のWeb APIを有り難く使わせて頂きます。
APIの使い方に関しては「A brief description on how to use Morningstar's API」を参考にさせて頂きました。

※財務データのソースについて、Quandlも試しましたが、結構なお値段がするので諦めます。Scrapingとかは面倒くさいので無理です。私が知るところだと、MorningstarのAPIに勝るもの無し。他におすすめなデータソースがあれば教えてください。

Disclaimer：
This is in no way an official API. As such it is not supported by Morningstar the organization. Furthermore, I would advise to rate limit your downloads to be nice to Morningstar's servers. If not to be nice to Morningstar then to be kind to the other users of the unofficial API so as not to ruin a good thing. 要はやりすぎないようにお願いします。これがなくなるとけっこうショックです。。。

環境

python 3.6.4
Anaconda (conda ver.4.4.10)：ライブラリ等は全部予め入っているかな？と思います。きっと。
windows 10

ソースコード

下記ソースコードの例では、Facebook (Tikcer: FB)の年単位の財務３表データを取得し、csvとexcel fileに出力しています。get_stmt_data関数の使い方はdocstringやmain関数の方をご参照。get_stmt_data関数をループで回して、取得してきたデータをデータベースに突っ込むと、自分専用の財務分析データベースが作れますので、ご活用ください。matplotlibとかでデータを可視化してみたら面白いかも。

getFinancialData.py

"""
Get US listing stock financial Data
"""

from io import StringIO
import requests
import pandas as pd
import time


def get_stmt_data(arg_ticker, arg_stmt_type, arg_period='annual'):
    '''Get the 3 major financial statements data + key_ratio from morningstar.com

    Args:
        arg_ticker (str):    the ticker symbol of the stock. 
                             Eg: 'FB' (Facebook)
        arg_stmt_type (str): the statement type. 
                             Must be one of the following.
                             - 'income_statement'
                             - 'balance_sheet'
                             - 'cash_flow'
                             - 'key_ratio' : Statistics for some important values.
                                             The arg_period cannot be 'quarterly'.
        arg_period (str):    the period of the statement. 
                             Must be 'annual' or 'quarterly'.
                             Default is 'annual'.

    Returns:
        pandas.core.frame.DataFrame: 
            the data of the statement
            columns:
                -TICKER
                -PERIOD
                -STMT_TYPE
                -DATE
                -ITEMS
                -VALUE
    '''

    dic_stmt_type = {'income_statement': 'is',
                     'cash_flow': 'cf',
                     'balance_sheet': 'bs',
                     'key_ratio' : 'kr'
                     }
    stmt_type = dic_stmt_type.get(arg_stmt_type, 'undefined')
    if stmt_type == 'undefined':
        raise ValueError(
            'the 2nd argument must be income_statement/cash_flow/balance_sheet')

    dic_period = {'annual': '12',
                  'quarterly': '3'
                  }
    period = dic_period.get(arg_period, 'undefined')
    if period == 'undefined':
        raise ValueError(
            'the 3rd argument must be annual/quarterly')

    if arg_stmt_type == 'key_ratio' and arg_period == 'quarterly':
        raise ValueError(
            'No quarterly key_ratio data')

    if arg_stmt_type == 'key_ratio':
        url = 'http://financials.morningstar.com/ajax/exportKR2CSV.html?t={0}'
        url = url.format(arg_ticker)
    else:
        url = 'http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t={0}&reportType={1}&period={2}&dataType=A&order=asc&columnYear=10&number=3'
        url = url.format(arg_ticker, stmt_type, period)
    
    r = requests.get(url, verify=False, timeout=10)
    # なぜか空っぽのresponseが帰ってくることが多いので、空でないresponseが帰ってくるまで再トライする。
    # print('   url:{0}'.format(url))
    # print('   len(r.text)={0}'.format(len(r.text)))
    while len(r.text) == 0:
        # print('   tring again...wait a sec...')
        time.sleep(3)
        r = requests.get(url, verify=False, timeout=10)

    # ヘッダー行指定でcsv読み込み -> df_tmp
    if arg_stmt_type == 'key_ratio':
        df_tmp = pd.read_csv(StringIO(r.text), header=2)
    else:
        df_tmp = pd.read_csv(StringIO(r.text), header=1)

    # 最初のcolumnのcolumn名を"ITEMS"に変更
    df_tmp = df_tmp.rename(columns={df_tmp.columns[0]: 'ITEMS'})

    # 整形(income_statementに重複する項目が存在するため、重複しないようにする)
    if arg_stmt_type == 'income_statement':
        items = df_tmp.loc[:, 'ITEMS']
        for i, item in enumerate(items):
            if item == 'Earnings per share':
                df_tmp.loc[i+1, 'ITEMS'] = 'Bacis EPS'
                df_tmp.loc[i+2, 'ITEMS'] = 'Diluted EPS'
            elif item == 'Weighted average shares outstanding':
                df_tmp.loc[i+1, 'ITEMS'] = 'Bacis outstanding'
                df_tmp.loc[i+2, 'ITEMS'] = 'Diluted outstanding'
            else:
                continue

    df_stmt_data = pd.DataFrame()

    # データを縦長に整形
    for column in df_tmp.columns:
        df_tmp_sliced = pd.DataFrame()
        if column == 'ITEMS':
            # 一列目をループしない
            continue
        elif arg_stmt_type == 'key_ratio' and str(df_tmp.loc[0, column]) == 'nan':
            # key_ratioの場合、データが入っていない年をスキップ
            continue
        else:
            df_tmp_sliced = df_tmp.loc[:, ["ITEMS", column]]
            df_tmp_sliced = df_tmp_sliced.rename(columns={column: 'VALUE'})
            #VALUE列はできる限りfloat型に変換する
            for i, value in enumerate(df_tmp_sliced['VALUE']):
                if isinstance(value, str) and ',' in value:
                    value = value.replace(',', '')
                    df_tmp_sliced.loc[i, 'VALUE'] = float(value)
            #列追加
            df_tmp_sliced['TICKER'] = arg_ticker
            df_tmp_sliced['PERIOD'] = arg_period
            df_tmp_sliced['STMT_TYPE'] = arg_stmt_type
            df_tmp_sliced['DATE'] = column
            df_tmp_sliced = df_tmp_sliced.loc[:, [
                'TICKER', 'PERIOD', 'STMT_TYPE', 'DATE', 'ITEMS', 'VALUE']]
            #df_stmt_dataに追加、結合
            df_stmt_data = pd.concat([df_stmt_data, df_tmp_sliced])

    return df_stmt_data


if __name__ == '__main__':
    fb_income_statement = get_stmt_data('FB', 'income_statement', 'annual')
    fb_cash_flow = get_stmt_data('FB', 'cash_flow', 'annual')
    fb_balance_sheet = get_stmt_data('FB', 'balance_sheet', 'annual')
    fb_key_ratio = get_stmt_data('FB', 'key_ratio', 'annual')

    fb_financials = pd.concat(
        [fb_income_statement, fb_balance_sheet, fb_cash_flow, fb_key_ratio])

    # index振り直し
    fb_financials = fb_financials.reset_index(drop=True)

    # import os
    # os.chdir('D:\DevStudy\python3')

    fb_financials.to_csv("fb_financials.csv")
    fb_financials.to_excel("fb_financials.xlsx")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up