More than 5 years have passed since last update.

Pythonの Beautiful Soup を使ってプロ野球選手のプロフィールをスクレイピングしてみた

Last updated at 2018-01-03Posted at 2018-01-03

日本プロ野球はこんなサイトがあって、対象の選手をクリックするとその人のプロフィールや成績が見れるので、Pythonの Beautiful Soupを使って雑にスクレイピングしてみた

スクレイピング対象はひとまずこの感じで。

Beautiful Soupをインストール

Beautiful SoupはWebページの記法であるHTMLを良い感じにパースしてくれるライブラリです。
入れてない場合は、以下のコマンドでインストールできます。

$ pip install beautifulsoup4

ライブラリを読み込む

from urllib.request import urlopen
from bs4 import BeautifulSoup

まずは一覧ページにある選手のリンク先を取得する

# スクレイピングするページのurl(阪神タイガース選手一覧)
url = 'http://npb.jp/bis/teams/rst_t.html'
htmlData = urlopen(url).read()
htmlParsed = BeautifulSoup(htmlData, 'html.parser')

各々の選手プロフィールを取得してスクレイピング

for div in htmlParsed.select('.rosterRegister a'):
    player_url = 'http://npb.jp' + div.get("href")
    player_htmlData = urlopen(player_url).read()
    htmlParsed2 = BeautifulSoup(player_htmlData, 'html.parser')
    print(div.string)
    for detail in htmlParsed2.select('.registerDetail'):
        print(detail.string)
    print('--------------------')

最終的にはこんな感じ

# ライブラリの読み込み
from urllib.request import urlopen
from bs4 import BeautifulSoup

# スクレイピングするページのurl(阪神タイガース選手一覧)
url = 'http://npb.jp/bis/teams/rst_t.html'
htmlData = urlopen(url).read()
htmlParsed = BeautifulSoup(htmlData, 'html.parser')

for div in htmlParsed.select('.rosterRegister a'):
    player_url = 'http://npb.jp' + div.get("href")
    player_htmlData = urlopen(player_url).read()
    htmlParsed2 = BeautifulSoup(player_htmlData, 'html.parser')
    print(div.string)
    for detail in htmlParsed2.select('.registerDetail'):
        print(detail.string)
    print('--------------------')

取得結果はこんな感じ

榎田　大樹
えのきだ・だいき
1986年8月7日生　　身長181cm　　体重92kg　　左投 左打
小林西高 - 福岡大 - 東京ガス
2010年ドラフト1位
--------------------
能見　篤史
のうみ・あつし
1979年5月28日生　　身長180cm　　体重72kg　　左投 左打
鳥取城北高 - 大阪ガス
2004年ドラフト自由枠
--------------------
横山　雄哉
よこやま・ゆうや
1994年2月21日生　　身長183cm　　体重83kg　　左投 左打
山形中央高 - 新日鉄住金鹿島
2014年ドラフト1位
--------------------
長いので省略

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up