「PythonでWebScraping」の入り口。メモ #Python3

仮想環境を立てておく。

$ pyvenv-3.5 env

仮想環境に入る。

$ source env/bin/activate
(env) $

requestsとBeautifulSoup4をinstallして、Pythonを起動。

(env) $ pip install requests
(env) $ pip install BeautifulSoup4
(env) $ python

あとは下記の流れ。

>>> import requests
>>> res = requests.get('スクレイピングしたいURL')
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(res.text, 'html')
>>> soup.select_one('タグとかセレクターとか指定する').text //ここをいろいろ変えるよ

「//ここをいろいろ変えるよ」って書いてあるとこは下記を参考。select_one()のとこを、find()とかいろいろ変える。

★BeautifulSoup objectのよく使うmethod

BeautifulSoup.find() -> タグを検索して最初にhitしたタグを返す
BeautifulSoup.find_all() -> タグを検索してhitしたタグのリストを返す
BeautifulSoup.find_previous() -> 一つ前のタグを返す
BeautifulSoup.find_next() -> 一つ後ろのタグを返す
BeautifulSoup.find_parent() -> 親タグを返す
BeautifulSoup.select() -> css selectorでタグのリストを返す
BeautifulSoup.select_one() -> css selectorで検索して最初にhitしたタグを返す

↑
この命令軍はここから引用してます。

例
soup.find（'li'） soup.find_all('li')

その他「Python　スクレイピング」でググったら出てくるので割愛。