More than 3 years have passed since last update.

【備忘】SeleniumでJPXから時価総額順位表（PDF）をスクレイピング

Posted at 2020-07-05

初めに

本記事は私の備忘録的なものです。
スクレイピングは自己責任でお願いします。

コード

# import libraries
import time
import requests
import urllib.request
from selenium import webdriver

# Open URL
url = 'https://www.jpx.co.jp/markets/statistics-equities/misc/02-02.html'
path = 'xxx\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(executable_path = path)
driver.get(url)
driver.implicitly_wait(5) #seconds
time.sleep(3) #seconds

# forループでテーブルデータから必要なPDFを取得してダウンロード
try:
    for j in range(2,12):
        try:
            for i in range(2,13):
                year_element = driver.find_element_by_xpath('//*[@id="readArea"]/div[5]/div[1]/table/tbody/tr[{}]/td[1]'.format(j))
                pdf_element = driver.find_element_by_xpath('//*[@id="readArea"]/div[5]/div/table[1]/tbody/tr[{}]/td[{}]/a'.format(j,i))
                pdf_url = pdf_element.get_attribute('href')
                print(pdf_url)
                urllib.request.urlretrieve(pdf_url, "時価額順位表_{}{}月.pdf".format(str(year_element.text),str(i-1)))
                print('ダウンロード　ファイル数'  + str(i-1))

                time.sleep(5)
        except Exception as ex:
            print(ex)

以上

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up