More than 5 years have passed since last update.

Capybaraでheadless chromeと戯れるための覚書。spider on ruby

Posted at 2018-03-27

いまいち手になじまないので、試行錯誤しているところ。
scraperはpythonが強いのかしら。

phantomjsよりheadless chromeのほうが問題が少ないらしいので、chromeで試す。

# !/usr/bin/env ruby
require 'capybara'
require 'capybara/dsl'
require 'selenium-webdriver'

chrome_option_arg = ['headless', 'disable-gpu', 'window-size=1680,1050']

Capybara.register_driver(:selenium) do |x|
  Capybara::Selenium::Driver.new(
    x,
    browser: :chrome,
    desired_capabilities: Selenium::WebDriver::Remote::Capabilities.chrome(chrome_options: { args: chrome_option_arg })
  );
end

Capybara.javascript_driver = :chrome

Capybara.configure do |x|
  x.default_max_wait_time = 10
  x.default_driver = :selenium
end

@b = Capybara.current_session
# include Capybara::DSL;

@b.visit('http://xxx')
@b.first(:xpath, "//h3[@class='xxx-title']/a").click

@b.windows.each do |w|
  @b.switch_to_window(w)
  @b.save_screenshot
end

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up