More than 5 years have passed since last update.

とりあえずWindows10でHeadlessChromeを使ってスクレイピングしてみる

Posted at 2018-01-11

もっとウェブブラウザに頑張って欲しい…

環境

Windows10
Google Chrome 63.0.3239.132(ChromeDriver 2.35)
Python 3.6.3(Anaconda利用)
selenium 3.8.0
beautifulsoup4 4.6.0

準備

Anacondaをインストール、Anaconda Promptでpip freezeでseleniumが入っていることを確認
ここからChromeドライバをダウンロード
解凍したらスクリプトと同じフォルダに配置(きっともっといい配置方法があると思いますがやり方がわからないので放置)

コード

stackoverflowからそのままなページがあるのでそれをとりあえずコピペとBeautifulSoupを組み込む
https://stackoverflow.com/questions/45364102/how-do-i-use-headless-chrome-in-chrome-60-on-windows-10

test.py

from selenium import webdriver
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument("headless")
driver = webdriver.Chrome(chrome_options = options)
driver.get('https://news.yahoo.co.jp/')
soup = BeautifulSoup(driver.page_source, "lxml")
driver.quit() 

hoges = soup.find_all('p', class_='ttl')
for hoge in hoges:
	print(hoge.text)

結果

とりあえずニュースタイトルが取れたのでここから更に発展させていきたい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up