HTMLからtable
タグの内容を取得する。
日経平均株価の一覧を取得してみる。
https://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O
selector
のパラメータはChrome Developer Toolsを使用して取得。
C#
AngleSharpを使用。
public static void Main(string[] args)
{
string url = "https://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O";
HttpClient client = new HttpClient();
var t = client.GetStringAsync(url);
string source = t.Result;
var parser = new HtmlParser();
var doc = parser.ParseDocument(source);
var table = doc.QuerySelector("#main > div.padT12.marB10.clearFix > table");
var trs = table.GetElementsByTagName("tr");
foreach (var tr in trs)
{
var tds = tr.GetElementsByTagName("td");
if (tds.Count() == 0) continue;
var ymd = DateTime.Parse(tds.ElementAt(0).TextContent).ToString("yyyy/MM/dd");
Console.Write("{0}", ymd);
for (int i = 1; i < tds.Count(); i++)
{
var num = double.Parse(tds.ElementAt(i).TextContent, NumberStyles.Any);
Console.Write(",{0}", num);
}
Console.WriteLine();
}
}
2019/08/23,20579.98,20719.31,20579.98,20710.91
2019/08/22,20706.07,20731.19,20584.29,20628.01
2019/08/21,20489.97,20626.05,20482.62,20618.57
2019/08/20,20605.35,20684.06,20582.01,20677.22
2019/08/19,20590.47,20633.9,20502.66,20563.16
2019/08/16,20323.97,20465.71,20300.35,20418.81
2019/08/15,20324.25,20419.88,20184.85,20405.65
2019/08/14,20669.99,20697.42,20581.17,20655.13
2019/08/13,20432.68,20503.38,20369.27,20455.44
2019/08/09,20758.15,20782.06,20676.92,20684.82
2019/08/08,20529.29,20682.24,20462.98,20593.35
2019/08/07,20548.07,20570.19,20406.52,20516.56
2019/08/06,20325.52,20607.83,20110.76,20585.31
2019/08/05,20909.98,20941.83,20514.19,20720.29
2019/08/02,21211.06,21211.06,20960.09,21087.16
2019/08/01,21361.58,21556.69,21288.9,21540.99
2019/07/31,21526.38,21589.11,21476.07,21521.53
2019/07/30,21681.82,21792.98,21665.86,21709.31
2019/07/29,21627.55,21652.95,21518.7,21616.8
2019/07/26,21700.2,21709.74,21590.66,21658.15
Python
BeautifulSoupを使用。
import requests
from bs4 import BeautifulSoup
import datetime
url = "https://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O"
source = requests.get(url)
doc = BeautifulSoup(source.text, 'html.parser')
table = doc.select_one("#main > div.padT12.marB10.clearFix > table")
trs = table.findAll('tr')
for tr in trs:
tds = tr.findAll('td')
if len(tds) == 0:
continue
ymd = datetime.datetime.strptime(tds[0].get_text(), '%Y年%m月%d日').strftime('%Y/%m/%d')
values = [ymd]
for i in range(1, len(tds)):
num = float(tds[i].get_text().replace(',',''))
values.append(num)
print(','.join(map(str,values)))