2
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

PythonでWebスクレイピングしてみたときのまとめ

Posted at

今度はWebスクレイピングに挑戦してみます。
調べたことをメモ。

参考;https://tonari-it.com/python-html-get-text-attr/

Webスクレイピングの基本形

# requests, bs4はあらかじめインストールしてね
# pip install request
# pip install beautifulsoup4

import requests, bs4
res = requests.get('https://tonari-it.com')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
print(soup.title)

テキスト、属性の取得

import requests, bs4
res = requests.get('https://tonari-it.com')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
elems = soup.select('#list h2 a')
for elem in elems:
    print('{} ({})'.format(elem.getText(), elem.get('href')))

Basic認証のあるページ

以下のようにrequests.getの後ろに追加すればよい

res = requests.get('スクレイピングしたいページのURL',auth=('ID','PASS'))

その他、Webスクレイピングについて

↓の記事が充実している
https://vaaaaaanquish.hatenablog.com/entry/2017/06/25/202924#requests

2
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?