Edited at

Google検索の鉄道運行情報をスクレイピングする

More than 5 years have passed since last update.

電車がちゃんと動いてるかをコードでチェックしたい時用

Google検索で「路線名 運行情報」と検索したときの結果をスクレイピングするテスト


google-train.rb

# -*- encoding: UTF-8 -*-

require 'cgi'
require 'open-uri'
require 'nokogiri'
require 'date'
require 'json'

def get(word)
base = "http://www.google.co.jp/search?num=1&lr=lang_ja&oe=utf-8&q="
url = "#{base}#{CGI.escape(word+" 運行情報")}"
html = open(url, 'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52 Safari/536.5').read
doc = Nokogiri(html)
# 運行情報はtableになっている
table_data = doc.css('.obcontainer').css('table td')
# table tdを一行ごとに配列にする
info_a = table_data.to_a.map!{|data| data.text}
# 最後の一行は最終更新時刻である
begin
unixtime = DateTime.strptime(info_a[-1], "%m月%d日%H時%M分更新").to_time.to_i
rescue
# Googleで運行情報が出てこない路線名だとここでエラーになる
p word+" はGoogleで調べられません"
end
text = info_a[0..-2].join("")

normal = false
type = nil
if info_a.size === 2 and info_a[0].sub(word, "") === "に遅れの情報はありません。"
# 問題のないときは一行目に「(路線名)に遅れの情報はありません。」と表示される
normal = true
else
# 問題のあるときは一行目の【】に運行障害の種類が格納されている
info_a[0] =~ /^【(.*)】/u
type = $1
end
info = {unixtime:unixtime, text:text, is_normal:normal, type:type}
end

lines = JSON.parse(open("lines.json").read)["response"]["line"]
lines.each do |word|
result = get(word)
p result if result[:is_normal] == false
sleep 1
end


http://express.heartrails.com/api.html#line

のデータをもとに、Google検索で運行情報が出てこない路線名を修正した東京都内の路線名一覧


lines.json

{"response":{"line":["JR中央線","JR中央本線","JR五日市線","JR京浜東北線","JR京葉線","JR八高線","JR南武線","JR埼京線","JR宇都宮線","JR山手線","JR常磐線","JR東海道本線","JR横浜線","JR横須賀線","JR武蔵野線","JR湘南新宿ライン","JR総武線","JR青梅線","JR高崎線","つくばエクスプレス","上越新幹線","京成押上線","京成本線","京成金町線","京急空港線","京急本線","京王線","京王新線","京王井の頭線","京王相模原線","京王高尾線","京王動物園線","京王競馬場線","北総線","小田急多

摩線
","小田急小田原線","ゆりかもめ線","りんかい線","東京メトロ丸ノ内線","東京メトロ千代田線","東京メトロ半蔵門線","東京メトロ南北線","東京メトロ日比谷線","東京メトロ有楽町線","東京メトロ東西線","東京メトロ銀座線","東京メトロ副都心線","東京モノレール線","東北新幹線","東急世田谷線","東急多摩川線","東急大井町>線","東急東横線","東急池上線","東急田園都市線","東急目黒線","東武亀戸線","東武伊勢崎線","東武大師線","東武東上線","東海道新幹線","西武国分寺線","西武多摩>川線","西武多摩湖線","西武山口線","西武拝島線","西武新宿線","西武有楽町線","西武池袋線","西武西武園線","西武豊島線","都営三田線","都営大江戸線","都営新宿>線","都営浅草線","都電荒川線","長野新幹線","日暮里・舎人ライナー"]}}