1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

【備忘】SeleniumでJPXから時価総額順位表(PDF)をスクレイピング

Posted at

初めに

  • 本記事は私の備忘録的なものです。
  • スクレイピングは自己責任でお願いします。

コード

# import libraries
import time
import requests
import urllib.request
from selenium import webdriver

# Open URL
url = 'https://www.jpx.co.jp/markets/statistics-equities/misc/02-02.html'
path = 'xxx\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(executable_path = path)
driver.get(url)
driver.implicitly_wait(5) #seconds
time.sleep(3) #seconds

# forループでテーブルデータから必要なPDFを取得してダウンロード
try:
    for j in range(2,12):
        try:
            for i in range(2,13):
                year_element = driver.find_element_by_xpath('//*[@id="readArea"]/div[5]/div[1]/table/tbody/tr[{}]/td[1]'.format(j))
                pdf_element = driver.find_element_by_xpath('//*[@id="readArea"]/div[5]/div/table[1]/tbody/tr[{}]/td[{}]/a'.format(j,i))
                pdf_url = pdf_element.get_attribute('href')
                print(pdf_url)
                urllib.request.urlretrieve(pdf_url, "時価額順位表_{}{}月.pdf".format(str(year_element.text),str(i-1)))
                print('ダウンロード ファイル数'  + str(i-1))

                time.sleep(5)
        except Exception as ex:
            print(ex)

以上

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?