LoginSignup
2
2

More than 5 years have passed since last update.

Selenium, Phantomjs & BeautifulSoup4

Last updated at Posted at 2016-09-01

必要なパッケージのインストール

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"

$ sudo aptitude install phantomjs xvfb
$ pip install selenium pyvirtualdisplay
from selenium import webdriver
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()
# <Display cmd_param=['Xvfb', '-br', '-nolisten', 'tcp', '-screen', ' - snip -

driver = webdriver.PhantomJS()
driver.get("http://www.example.com)
type(driver.page_source)
# <class 'str'>

driver.page_source
# '<!DOCTYPE html><html itemscope="" itemtype="http://schema.org/Web - snip -

from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_sourve)
i = [ {"href": x["href"], "text": x.string, "class": x._class } for x in soup.find_all("a") ]
print(i)
# [{'class': None, 'text': 'MENU', 'href': 'javascript:;'}, {'class': None, 'text': 'トップページ', 'href': '/'}, {'class': None, 'text': 'プラットフォーム', 'href': '/pf/'},  - snip -

現在(2016年9月)でも下記の問題があるので Ubuntu 16.04 で Phantomjs を使用する場合はパッケージからではなく通常の手順でインストールしたほうがいい
https://bugs.launchpad.net/ubuntu/+source/phantomjs/+bug/1578444

2
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2