More than 5 years have passed since last update.

【Python】BeautifulSoup4 と RequestsでWEBスクレイピング

Last updated at 2017-12-17Posted at 2017-12-17

基本的な使い方

コード

シンプルに http://google.com からタイトルを取得し表示します。

sample.py

import requests
from bs4 import BeautifulSoup

URL = 'http://google.com'
headers = {"User-Agent": "hoge"}

resp = requests.get(URL, timeout=1, headers=headers)
r_text = resp.text

soup = BeautifulSoup(r_text, 'html.parser')
soup_titles = soup.find_all('title')

for t in soup_titles:
    print(t.get_text())

実行結果

Google

説明

1

import requests
from bs4 import BeautifulSoup

今回はrequestsを使いHttp経由でデータを取得し、BeautifulSoupでタグをパースするので上記２つをインポートします。入っていない場合は下記の様にpip installします。

$ pip install requests
$ pip install beautifulsoup4

2

URL = 'http://google.com'
headers = {"User-Agent": "hoge"}
resp = requests.get(URL, timeout=1, headers=headers)
r_text = resp.text

URLに取得元のURLを指定。
headersにリクエストヘッダを指定。
requests.getメソッドでデータを取得。

3

soup = BeautifulSoup(r_text, 'html.parser')

取得したテキストを元にBeautifulsoupオブジェクトを取得

4

soup_titles = soup.find_all('title')

titleタグのセットを取得

5

for t in soup_titles:
    print(t.get_text())

セットからタグ要素を取得し、テキスト部分を表示

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

【Python】BeautifulSoup4 と RequestsでWEBスクレイピング

基本的な使い方

コード

説明

1

2

3

4

5

【Python】BeautifulSoup4 と RequestsでWEBスクレイピング