8
8

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

【Python】BeautifulSoup4 と RequestsでWEBスクレイピング

Last updated at Posted at 2017-12-17

#基本的な使い方
##コード
シンプルに http://google.com からタイトルを取得し表示します。

sample.py
import requests
from bs4 import BeautifulSoup

URL = 'http://google.com'
headers = {"User-Agent": "hoge"}

resp = requests.get(URL, timeout=1, headers=headers)
r_text = resp.text

soup = BeautifulSoup(r_text, 'html.parser')
soup_titles = soup.find_all('title')

for t in soup_titles:
    print(t.get_text())
実行結果
Google

##説明
###1

import requests
from bs4 import BeautifulSoup

今回はrequestsを使いHttp経由でデータを取得し、BeautifulSoupでタグをパースするので上記2つをインポートします。入っていない場合は下記の様にpip installします。

$ pip install requests
$ pip install beautifulsoup4

###2

URL = 'http://google.com'
headers = {"User-Agent": "hoge"}
resp = requests.get(URL, timeout=1, headers=headers)
r_text = resp.text

URLに取得元のURLを指定。
headersにリクエストヘッダを指定。
requests.getメソッドでデータを取得。

###3

soup = BeautifulSoup(r_text, 'html.parser')

取得したテキストを元にBeautifulsoupオブジェクトを取得

###4

soup_titles = soup.find_all('title')

titleタグのセットを取得

###5

for t in soup_titles:
    print(t.get_text())

セットからタグ要素を取得し、テキスト部分を表示

8
8
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8
8

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?