More than 3 years have passed since last update.

学習記録（２日目）#BeautifulSoupによるスクレイピング

Last updated at 2020-03-11Posted at 2020-03-09

学習内容

BeautifulSoupによるスクレイピング

BeautifulSoupによるスクレイピング

HTMLやXMLから情報を抽出し解析を行うライブラリ。ダウンロード機能はないので、urllibと併用して使う。

以下、BeautifulSoupの基本的な使い方

# ライブラリのインポート
from bs4 import BeautifulSoup

html1 = """
<html><body>
    <h1>スクレイピング</h1>
    <p>Webページの解析</p>
    <p>任意の箇所の抽出</p>
</body></html>
"""

# HTMLの解析
soup = BeautifulSoup(html1, 'html.parser')

# 任意の要素を抽出
h1 = soup.html.body.h1
p1 = soup.html.body.p
p2 = p1.next_sibling.next_sibling

print(h1.string)
print(p1.string)
print(p2.string)

実行結果

スクレイピング
Webページを抽出
任意の箇所の抽出

BeautifulSoupとurllibの併用によるスクレイピング

# ライブラリのインポート
import urllib.request as req
from bs4 import BeautifulSoup

url = "https://api.aoikujira.com/zip/xml/1500042"

res = req.urlopen(url)

# urlopen()で取得したデータを解析
soup = BeautifulSoup(res, 'html.parser')

ken = soup.find("ken").string
shi = soup.find("shi").string
cho = soup.find("cho").string

print(ken, shi, cho)

参考文献

参考にした書籍から公開されているGitHubを添付しておきます。
増補改訂Pythonによるスクレイピング&機械学習開発テクニック

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up