Help us understand the problem. What is going on with this article?

みずほ銀行のホームページからロト6の当選番号をとりだす

More than 1 year has passed since last update.

Summary

F#というスクリプト言語で、みずほ銀行のホームページからロト6の当選番号をとりだす

Environment

mono, paketがなければbrew installでダウンロード

firefoxがなければbrew cask installでダウンロード

$ sw_vers 
ProductName:    Mac OS X
ProductVersion: 10.13.4
BuildVersion:   17E199

$ mono --version
Mono JIT compiler version 5.4.1.6 (tarball Mon Dec 11 14:59:42 GMT 2017)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
    TLS:           normal
    SIGSEGV:       altstack
    Notification:  kqueue
    Architecture:  amd64
    Disabled:      none
    Misc:          softdebug 
    LLVM:          supported, not enabled.
    GC:            sgen (concurrent by default)

$ paket --version
Paket version 5.155.0

$ brew cask info firefox
firefox: 59.0.2
https://www.mozilla.org/firefox/
/usr/local/Caskroom/firefox/59.0.2 (64B)
From: https://github.com/caskroom/homebrew-cask/blob/master/Casks/firefox.rb
==> Name
Mozilla Firefox
==> Languages
cs, de, en-GB, en, es-AR, es-CL, es-ES, fi, fr, gl, in, it, ja, ko, nl, pl, pt, pt-BR, ru, tr, uk, zh-TW, zh
==> Artifacts
Firefox.app (App)

ホームページの構成

今月の当選番号はindexにあり、過去の当選番号はbucknumberにリンクが貼ってある

index                https://www.mizuhobank.co.jp/retail/takarakuji/loto/loto6/index.html
└── backnumber       https://www.mizuhobank.co.jp/retail/takarakuji/loto/backnumber/index.html
    ├── ChartA       https://www.mizuhobank.co.jp/retail/takarakuji/loto/loto6/index.html?year=2018&month=2
    ├── ChartB_new   https://www.mizuhobank.co.jp/retail/takarakuji/loto/backnumber/detail.html?fromto=641_660&type=loto6
    └── ChartB_old   https://www.mizuhobank.co.jp/retail/takarakuji/loto/backnumber/loto60001.html

それぞれのページの当選番号を示すCSS Selectorは違うので確認しておく

回別 抽選 当選番号
index table.typeTK > thead > tr > th.alnCenter.bgf7f7f7 table.typeTK > tbody > tr > td[colspan='6'].alnCenter table.typeTK > tbody > tr > td.alnCenter.extension > strong
ChartA table.typeTK > thead > tr > th.alnCenter.bgf7f7f7 table.typeTK > tbody > tr > td[colspan='6'].alnCenter table.typeTK > tbody > tr > td.alnCenter.extension
ChartB_new div.spTableScroll > table.typeTK > tbody > tr > th.bgf7f7f7 div.spTableScroll > table.typeTK > tbody > tr > td.alnRight div.spTableScroll > table.typeTK > tbody > tr > td[class='']
ChartB_old div.spTableScroll > table.typeTK > tbody > tr > th.bgf7f7f7 div.spTableScroll > table.typeTK > tbody > tr > td.alnRight div.spTableScroll > table.typeTK > tbody > tr > td:not(.alnRight)

htmlをダウンロード

ロト6の当選数字はJavaScriptが実行されないとHtml上に出現しないのでBrowserを通じてHtmlをダウンロードする方法でいく

ライブラリのダウンロード

// fooフォルダを作成
$ mkdir foo
$ cd foo/

// ライブラリをダウンロード
$ paket init
$ vim paket.dependencies

    source https://www.nuget.org/api/v2
    nuget fsharp.data == 3.0.0-beta3
    nuget Selenium.webdriver
    nuget Selenium.Support

$ paket install

コードを書く(とりあえずエラー処理はかんがえない)

// File name is foo.fsx


#r "./packages/FSharp.Data/lib/net45/FSharp.Data.dll"
open FSharp.Data

#r "./packages/Selenium.WebDriver/lib/net45/WebDriver.dll"
#r "./packages/Selenium.Support/lib/net45/WebDriver.Support.dll"
open OpenQA.Selenium
open OpenQA.Selenium.Firefox
open OpenQA.Selenium.Support.UI

open System

let url          = @"https://www.mizuhobank.co.jp/retail/takarakuji/loto/loto6/index.html?year=2018&month=2"
let kaisuuCSS    = @"table.typeTK > thead > tr > th.alnCenter.bgf7f7f7"
let kaisaibiCSS  = @"table.typeTK > tbody > tr > td[colspan='6'].alnCenter"
let hitNumberCSS = @"table.typeTK > tbody > tr > td.alnCenter.extension"

type Fox () =

    // ブラウザの画面は見えないようにする
    let opt = new FirefoxOptions()
    do  opt.AddArgument("--headless")
    let driver = new FirefoxDriver( opt )
    // ウエイトタイムはとりえあえず10秒に設定
    let wait = WebDriverWait(driver, TimeSpan.FromSeconds(10.))

    // ウエイトをかけてhtmlを取得する
    member this.HtmlWithJS(url) =

        // 指定したアドレスのホームページに移動
        driver.Url <- url

        // 指定したCSSのinnerTextが出現するまで待つ(最大10秒)
        // innerTextが出現しなかったらエラー
        wait.Until( fun (driver:IWebDriver) ->
            [
                driver.FindElements( By.CssSelector( kaisuuCSS    ))
                driver.FindElements( By.CssSelector( kaisaibiCSS  ))
                driver.FindElements( By.CssSelector( hitNumberCSS ))
            ]
            |> Seq.concat
            |> Seq.forall ( fun (x:IWebElement) -> x.Text <> String.Empty )
            ) |> ignore

        // htmlをかえす
        driver.PageSource

    member this.Quit() =
        driver.Quit()


// Firefoxブラウザを起動する
let f = Fox()

// みずほ銀行のロト6の当選番号が書いてあるホームページからhtmlをダウンロード
f.HtmlWithJS url

// htmlからロト6の当選番号を取り出す
|> HtmlDocument.Parse
|> fun doc ->
    let index   = doc.CssSelect( kaisuuCSS    ) |> List.map ( fun n -> n.InnerText() |> fun s -> String.filter Char.IsDigit s )
    let date    = doc.CssSelect( kaisaibiCSS  ) |> List.map ( fun n -> n.InnerText() |> fun s -> s.Replace("年","/").Replace("月","/").Replace("日",""))
    let numbers = doc.CssSelect( hitNumberCSS ) |> List.map ( fun n -> n.InnerText() ) |> List.chunkBySize 7 |> List.map ( List.truncate 6 )
    (index, date, numbers)
    |||> List.map3 ( fun a b c -> [a] @ [b] @ c )
|> List.iter( fun l -> printfn "%A" l )

f.Quit()

実行してみる

$ fsharpi foo.fsx

// firefoxの処理内容が色々出力される・・・

["1255"; "2018/2/26"; "02"; "14"; "15"; "27"; "40"; "43"]
["1254"; "2018/2/22"; "07"; "08"; "11"; "15"; "36"; "39"]
["1253"; "2018/2/19"; "01"; "12"; "14"; "24"; "33"; "37"]
["1252"; "2018/2/15"; "04"; "15"; "18"; "19"; "22"; "29"]
["1251"; "2018/2/12"; "21"; "28"; "31"; "33"; "34"; "40"]
["1250"; "2018/2/8"; "03"; "10"; "18"; "22"; "23"; "40"]
["1249"; "2018/2/5"; "05"; "08"; "15"; "20"; "25"; "27"]
["1248"; "2018/2/1"; "02"; "06"; "14"; "28"; "34"; "37"]

エラー処理を加える

timeoutほか のときに FireFox を quit する

"Stale Element Reference Exception" が多発するのでその処理

コード例はここを参照

https://github.com/callmekohei/lotofs/blob/master/src/mizuho.fsx#L163

about Stale Element Reference Exception

https://docs.seleniumhq.org/exceptions/stale_element_reference.jsp

wait.Untilの動きをみてみる

htmlファイルを作成

<html>
    <body>
        <table>
            <tr>
                <td>foo</td>
                <td></td>
                <td class=abc>baz</td>
            </tr>
        </table>
    </body>
</html>

<style>
table {
    border-collapse: collapse;
}
td {
    border: solid 1px;
    padding: 0.5em;
}
</style>

コードを書いてみる

// File name is bar.fsx

#r "./packages/Selenium.WebDriver/lib/net45/WebDriver.dll"
#r "./packages/Selenium.Support/lib/net45/WebDriver.Support.dll"
open OpenQA.Selenium
open OpenQA.Selenium.Firefox
open OpenQA.Selenium.Support.UI
open System


type Fox () =

    let opt = new FirefoxOptions()
    do  opt.AddArgument("--headless")
    let driver = new FirefoxDriver( opt )
    let wait = WebDriverWait(driver, TimeSpan.FromSeconds(10.))
    let mutable cnt = 0

    member this.Html(url) =
        try
            driver.Url <- url
            wait.Until( fun (driver:IWebDriver) ->
                stdout.WriteLine(cnt)
                cnt <- cnt + 1
                [
                    driver.FindElements( By.CssSelector( "html > body > table > tbody > tr > td:not(.abc)" ))
                    driver.FindElements( By.CssSelector( "html > body > table > tbody > tr > td.abc" ))
                ]
                |> Seq.concat
                |> Seq.forall ( fun (x:IWebElement) -> x.Text <> String.Empty )
                ) |> ignore
            driver.PageSource
        with e ->
            driver.Quit()
            stdout.WriteLine(e.Message)
            String.Empty

    member this.Quit() =
        driver.Quit()


let f = Fox()
@"file:///Users/callmekohei/Desktop/foo.html"
|> f.Html
stdout.WriteLine("foo bar baz")
f.Quit()

結果

10秒のウエイトで20回近くポーリングが行われてるのが確認できる

// いろいろfirefoxの表示が出る

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Timed out after 10 seconds
foo bar baz

CSSセレクタでのnot

FSharp.DataCssSelectornotがつかえない(2018/4/11現在)

notを使う場合は下記のようにする

// css selector in Selenium
@"div.spTableScroll > table.typeTK > tbody > tr > td:not(.alnRight)"

// css selector in FSharp.Data
@"div.spTableScroll > table.typeTK > tbody > tr > td.''"

参考

see left column : WebDriverWait Class: https://seleniumhq.github.io/selenium/docs/api/dotnet/

https://qiita.com/Azunyan1111/items/b161b998790b1db2ff7a

http://blog.okazuki.jp/entry/2015/01/28/205817

http://hutyao.hatenablog.com/entry/20110908/1315490306

https://qiita.com/scivola/items/08a717eaba9c349fd6ba

callmekohei
コーヒー飲みながらぼんやりとF#書くのが好きです。
fukuokaex
エンジニア/企業向けにElixirプロダクト開発・SI案件開発を支援する福岡のコミュニティ
https://fukuokaex.fun/
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした