More than 5 years have passed since last update.

HaskellでHTMLを取得する

Last updated at 2019-11-21Posted at 2019-11-21

HaskellでHTTP通信してみようかな、とふと思い立ったのでやってみました。それ以上の理由はありません。Haskellは大してできない初心者です。学びがてら。

とりあえずHTMLを取得する。

http-conduit導入

とりあえず、HTTP通信に必要なパッケージを入れなくちゃ、というところでhttp-conduitというのが良さそうなので導入。

そもそもHaskellでパッケージ入れるの初めてなんですが、cabalという便利なやつがいるんですね。

cabal update
cabal install http-conduit

基本いつもパッケージインストーラとか使うとき「updateとかせんでも行けるやろ～」とか言ってるんですが、今回はupdateしないと駄目でした。ちゃんとupdateはしましょう。案外updateに時間がかかりました。って言っても数分とかです。
ついでに本体のインストールにも時間がかかりました。依存してるパッケージが結構な量。10分は掛かった(回線が貧弱だったか？)。

HTML取得

いざ本番。

今回使うのはgetResponseBody。

*Main> :t getResponseBody
getResponseBody :: Response a -> a

Response?ってなっていた。どうやら取得できるデータがすべて入ってるものらしい。これを取得するためにはhttpLBSを使うと良いらしい。

*Main> :t httpLBS
httpLBS
  :: Control.Monad.IO.Class.MonadIO m =>
     Request -> m (Response Data.ByteString.Lazy.Internal.ByteString)

Requestを取ればいいのか？という感じ。String->Requestができればすべて解決ですね。

*Main> :t parseRequest
parseRequest
  :: Control.Monad.Catch.MonadThrow m => String -> m Request
*Main> :t parseRequest_
parseRequest_ :: String -> Request

今回は後者のparseRequest_が良さそう。

というところでString->getResponseBodyまでの流れはこんな感じ。

String --[parseRequest_]--> Request --[httpLBS]--> Response --[getResponseBody]--> BODY

完成形

example.hs

import Network.HTTP.Simple

main :: IO()
main = do
    input <- getLine
    res <- httpLBS (parseRequest_ input)
    print (getResponseBody res)

入力でURLを入力してHTMLを出力するだけの。

Test

GHCi, version 8.2.2: http://www.haskell.org/ghc/  :? for help
Prelude> :l example.hs
Ok, one module loaded.
Prelude Main> main
https://example.com/
"<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n"

無事取得した。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up