More than 3 years have passed since last update.

Google トレンドをスクレイピングする

Posted at 2022-11-27

前提

対象サイト（2022/11/26 時点）
macOS 12.3.1
ruby 2.7.4p191
capybara (3.35.3)
selenium-webdriver (3.142.7)
chromedriver 107.0.5304.62
nokogiri (1.12.3)

毎日の検索トレンド

RSSがあるのでそれを使う
タイトルくらいしか取れないので、それ以上のものが欲しい場合はブラウザ経由で取得する

require 'rss'

url = "https://trends.google.co.jp/trends/trendingsearches/daily/rss?geo=JP"
rss = RSS::Parser.parse(url)

rss.items.each do |item|
  puts item.title
  puts item.link
  puts '-' * 10
end

# =>
#
# フランス
# https://trends.google.co.jp/trends/trendingsearches/daily?geo=JP#%E3%83%95%E3%83%A9%E3%83%B3%E3%82%B9
# ----------
# アルゼンチンメキシコ
# https://trends.google.co.jp/trends/trendingsearches/daily?geo=JP#%E3%82%A2%E3%83%AB%E3%82%BC%E3%83%B3%E3%83%81%E3%83%B3%E3%83%A1%E3%82%AD%E3%82%B7%E3%82%B3
# ----------
# ジャパンカップ予想
# https://trends.google.co.jp/trends/trendingsearches/daily?geo=JP#%E3%82%B8%E3%83%A3%E3%83%91%E3%83%B3%E3%82%AB%E3%83%83%E3%83%97%E4%BA%88%E6%83%B3
# ----------
# ...

リアルタイムの検索トレンド

こっちは RSS がないので自前で取得する
- JS 実行があるのでブラウザ経由で
  - 今回は、capybara + selenium-webdriver + chromedriver
    - $ brew install chromedriver
必須ではないけど、Nokogiri を使ってる
- Capybara の記法に慣れていないので
各フィードを click して、他の情報（関連キーワード等）を取りたかったけど、、
- Selenium::WebDriver::Error::ElementClickInterceptedError

require 'selenium-webdriver'
require 'capybara'
require 'capybara/dsl'

# https://www.rubydoc.info/gems/capybara/Capybara.configure
Capybara.configure do |config|
  config.run_server = false

  # https://github.com/teamcapybara/capybara#selenium
  config.default_driver = :selenium_chrome_headless
  config.javascript_driver = :selenium_chrome_headless

  config.app_host   = 'https://trends.google.co.jp'
  config.default_max_wait_time = 30
end

class PageFetcher
  include Capybara::DSL

  def fetch
    visit('/trends/trendingsearches/realtime?geo=JP&category=all')

    # エラーが起きる、、
    # link = page.all('.details-top a')[0]
    # link.click

    page
  end
end

page = PageFetcher.new.fetch
# => #<Capybara::Session>

doc = Nokogiri::HTML.parse(page.html)
# => #<Nokogiri::HTML4::Document>

doc.css('.details-top').each do |ele|
  puts ele.css('a').text.split(/\R/).map(&:strip).reject(&:empty?).join(', ')
  puts '-' * 10
end && nil

# =>
#
# 全日本実業団対抗女子駅伝競走大会, 新谷仁美, 廣中璃梨佳, 資生堂, 一山麻緒
# ----------
# 関口 宏, サンデーモーニング, ジャパン・ニュース・ネットワーク, 松原耕二
# ----------
# 武豊, ジャパンカップ, クロフネ, 日本中央競馬会, リアルスティール, 競走馬の血統, G1
# ----------
# ...

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up