LoginSignup
2
6

More than 3 years have passed since last update.

Webスクレイピング

Posted at

HTMLからtableタグの内容を取得する。
日経平均株価の一覧を取得してみる。
https://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O
selectorのパラメータはChrome Developer Toolsを使用して取得。

C#

AngleSharpを使用。

public static void Main(string[] args)
{
    string url = "https://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O";
    HttpClient client = new HttpClient();
    var t = client.GetStringAsync(url);
    string source = t.Result;

    var parser = new HtmlParser();
    var doc = parser.ParseDocument(source);
    var table = doc.QuerySelector("#main > div.padT12.marB10.clearFix > table");
    var trs = table.GetElementsByTagName("tr");
    foreach (var tr in trs)
    {
        var tds = tr.GetElementsByTagName("td");
        if (tds.Count() == 0) continue;
        var ymd = DateTime.Parse(tds.ElementAt(0).TextContent).ToString("yyyy/MM/dd");
        Console.Write("{0}", ymd);
        for (int i = 1; i < tds.Count(); i++)
        {
            var num = double.Parse(tds.ElementAt(i).TextContent, NumberStyles.Any);
            Console.Write(",{0}", num);
        }
        Console.WriteLine();
    }
}
2019/08/23,20579.98,20719.31,20579.98,20710.91
2019/08/22,20706.07,20731.19,20584.29,20628.01
2019/08/21,20489.97,20626.05,20482.62,20618.57
2019/08/20,20605.35,20684.06,20582.01,20677.22
2019/08/19,20590.47,20633.9,20502.66,20563.16
2019/08/16,20323.97,20465.71,20300.35,20418.81
2019/08/15,20324.25,20419.88,20184.85,20405.65
2019/08/14,20669.99,20697.42,20581.17,20655.13
2019/08/13,20432.68,20503.38,20369.27,20455.44
2019/08/09,20758.15,20782.06,20676.92,20684.82
2019/08/08,20529.29,20682.24,20462.98,20593.35
2019/08/07,20548.07,20570.19,20406.52,20516.56
2019/08/06,20325.52,20607.83,20110.76,20585.31
2019/08/05,20909.98,20941.83,20514.19,20720.29
2019/08/02,21211.06,21211.06,20960.09,21087.16
2019/08/01,21361.58,21556.69,21288.9,21540.99
2019/07/31,21526.38,21589.11,21476.07,21521.53
2019/07/30,21681.82,21792.98,21665.86,21709.31
2019/07/29,21627.55,21652.95,21518.7,21616.8
2019/07/26,21700.2,21709.74,21590.66,21658.15

Python

BeautifulSoupを使用。

import requests
from bs4 import BeautifulSoup
import datetime

url = "https://stocks.finance.yahoo.co.jp/stocks/history/?code=998407.O"
source = requests.get(url)
doc = BeautifulSoup(source.text, 'html.parser')
table = doc.select_one("#main > div.padT12.marB10.clearFix > table")
trs = table.findAll('tr')
for tr in trs:
    tds = tr.findAll('td')
    if len(tds) == 0:
        continue
    ymd = datetime.datetime.strptime(tds[0].get_text(), '%Y年%m月%d日').strftime('%Y/%m/%d')
    values = [ymd]
    for i in range(1, len(tds)):
        num = float(tds[i].get_text().replace(',',''))
        values.append(num)
    print(','.join(map(str,values)))

参考情報

2
6
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
6