More than 5 years have passed since last update.

PythonでNPBのチームデータが取りたい。3(球団名とPDFのpath)

Python3

Last updated at 2018-05-19Posted at 2018-05-10

昨日やってたやつをまとめることができた。
でも本当にやりたいことは達成できてない。
まだ途中。

getPDF.py

import bs4
import requests

url = requests.get('http://jpbpa.net/register/')
url.raise_for_status()
# HTMLparserでHTMLのaタグだけ絞り込み
soup = bs4.BeautifulSoup(url.text, "html.parser")
elems = soup.select('a')
for elem in elems:
    team = elem.getText()
    path = elem.get('href')
    repath = path.replace('..', 'http://jpbpa.net')
    # elem.getText=球団名、elem.get('href')=PDFのpath
    if elem.getText() in {'ロッテ', 'ソフトバンク', '西武', '楽天', 'オリックス', '日本ハム',
                          '広島', '阪神', 'DeNA', '巨人', '中日', 'ヤクルト'}:
        print('{}({})'.format(team, repath))

この結果をtextに出力すると。

result.txt

広島(http://jpbpa.net/up_pdf/1523843962-483491.pdf)
ソフトバンク(http://jpbpa.net/up_pdf/1524204565-932446.pdf)
阪神(http://jpbpa.net/up_pdf/1523843967-655138.pdf)
西武(http://jpbpa.net/up_pdf/1523844823-945400.pdf)
DeNA(http://jpbpa.net/up_pdf/1523843918-062109.pdf)
楽天(http://jpbpa.net/up_pdf/1523843952-195082.pdf)
巨人(http://jpbpa.net/up_pdf/1523844190-573673.pdf)
オリックス(http://jpbpa.net/up_pdf/1523843924-372362.pdf)
中日(http://jpbpa.net/up_pdf/1523844654-036371.pdf)
日本ハム(http://jpbpa.net/up_pdf/1524204572-137100.pdf)
ヤクルト(http://jpbpa.net/up_pdf/1523843936-367593.pdf)
ロッテ(http://jpbpa.net/up_pdf/1523843946-624410.pdf)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up