Qiita Teams that are logged in
You are not logged in to any team

Log in to Qiita Team
Community
OrganizationEventAdvent CalendarQiitadon (β)
Service
Qiita JobsQiita ZineQiita Blog
25
Help us understand the problem. What are the problem?

More than 5 years have passed since last update.

Google検索の鉄道運行情報をスクレイピングする

電車がちゃんと動いてるかをコードでチェックしたい時用
Google検索で「路線名 運行情報」と検索したときの結果をスクレイピングするテスト

google-train.rb
# -*- encoding: UTF-8 -*-
require 'cgi'
require 'open-uri'
require 'nokogiri'
require 'date'
require 'json'

def get(word)
  base = "http://www.google.co.jp/search?num=1&lr=lang_ja&oe=utf-8&q="
  url = "#{base}#{CGI.escape(word+" 運行情報")}"
  html = open(url, 'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52 Safari/536.5').read
  doc = Nokogiri(html)
  # 運行情報はtableになっている
  table_data = doc.css('.obcontainer').css('table td')
  # table tdを一行ごとに配列にする
  info_a = table_data.to_a.map!{|data| data.text}
  # 最後の一行は最終更新時刻である
  begin
    unixtime = DateTime.strptime(info_a[-1], "%m月%d日%H時%M分更新").to_time.to_i
  rescue
    # Googleで運行情報が出てこない路線名だとここでエラーになる
    p word+" はGoogleで調べられません"
  end 
  text = info_a[0..-2].join("")

  normal = false
  type = nil 
  if info_a.size === 2 and info_a[0].sub(word, "") === "に遅れの情報はありません。"
    # 問題のないときは一行目に「(路線名)に遅れの情報はありません。」と表示される
    normal = true
  else
    # 問題のあるときは一行目の【】に運行障害の種類が格納されている
    info_a[0] =~ /^【(.*)】/u
    type = $1
  end 
  info = {unixtime:unixtime, text:text, is_normal:normal, type:type}
end

lines = JSON.parse(open("lines.json").read)["response"]["line"]
lines.each do |word|
  result = get(word)
  p result if result[:is_normal] == false
  sleep 1
end

http://express.heartrails.com/api.html#line
のデータをもとに、Google検索で運行情報が出てこない路線名を修正した東京都内の路線名一覧

lines.json
{"response":{"line":["JR中央線","JR中央本線","JR五日市線","JR京浜東北線","JR京葉線","JR八高線","JR南武線","JR埼京線","JR宇都宮線","JR山手線","JR常磐線","JR東海道本線","JR横浜線","JR横須賀線","JR武蔵野線","JR湘南新宿ライン","JR総武線","JR青梅線","JR高崎線","つくばエクスプレス","上越新幹線","京成押上線","京成本線","京成金町線","京急空港線","京急本線","京王線","京王新線","京王井の頭線","京王相模原線","京王高尾線","京王動物園線","京王競馬場線","北総線","小田急多
摩線","小田急小田原線","ゆりかもめ線","りんかい線","東京メトロ丸ノ内線","東京メトロ千代田線","東京メトロ半蔵門線","東京メトロ南北線","東京メトロ日比谷線","東京メトロ有楽町線","東京メトロ東西線","東京メトロ銀座線","東京メトロ副都心線","東京モノレール線","東北新幹線","東急世田谷線","東急多摩川線","東急大井町>線","東急東横線","東急池上線","東急田園都市線","東急目黒線","東武亀戸線","東武伊勢崎線","東武大師線","東武東上線","東海道新幹線","西武国分寺線","西武多摩>川線","西武多摩湖線","西武山口線","西武拝島線","西武新宿線","西武有楽町線","西武池袋線","西武西武園線","西武豊島線","都営三田線","都営大江戸線","都営新宿>線","都営浅草線","都電荒川線","長野新幹線","日暮里・舎人ライナー"]}}
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
25
Help us understand the problem. What are the problem?