More than 3 years have passed since last update.

Python + Beautiful Soupスクレイピング

Posted at 2021-10-27

Requestsインストール

requestsはWebサイトの情報取得や画像の収集など、スクレイピングに役立つHTTP通信用のPythonのライブラリです。
後述するBeautiful Soupと併用することで、Webサイトを解析し、必要な情報だけを抜き出すことができます。

▼インストール

pip install requests

Beautiful Soupインストール

Beautiful SoupはHTMLやXMLからデータを抽出するためのライブラリです。
タグを指定し、ピンポイントでのデータ抽出も可能です。

▼インストール

pip install beautifulsoup4

ソースコード

ここでは基本構文のみ。
実際に何か作った際は、使い方の詳細を書きます。

▼基本的な使い方

get.py

# resuestsモジュールをインポート
import requests

# BeautifulSoupクラスをインポート
from bs4 import BeautifulSoup

# スクレイピング対象URLのHTMLを取得
html = requests.get("https://www.yahoo.co.jp/")

# レスポンスのHTMLをもとにBeautifulSoupのオブジェクトを生成
soup = BeautifulSoup(html.content, "html.parser")

# HTMLのタイトルタグを取得
print(soup.title)

実行結果

D:\prj_python>python get.py
<title>Yahoo! JAPAN</title>

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up