6
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

ElixirAdvent Calendar 2023

Day 14

【Livebook】HTTPoisonとFlokiを使って、スクレイピングしてみる(AtCoderの正解例をとってくる例)

Posted at

はじめに

LivebookでAtCoderの入力例、出力例を取得したかったので、調べながらやってみました。

使用するモジュール

HTTPoisonで取得して、Flokiでパースしてみます。

Setupに追加します

Mix.install(
  [
      {:httpoison, "~> 2.2.1"},
      {:floki, "~> 0.35.2"}
  ]
)

取得

doc = HTTPoison.get!("https://atcoder.jp/contests/abc334/tasks/abc334_a")
実際の応答はこんな感じ
%HTTPoison.Response{
  status_code: 200,
  body: "\n\n\n\n\n\n\n\n<!DOCTYPE html>\n<html>\n<head>\n\t<title>A - Christmas Present</title>\n\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n\t<meta http-equiv=\"Content-Language\" content=\"en\">\n\t<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0\">\n\t<meta name=\"format-detection\" content=\"telephone=no\">\n\t<meta name=\"google-site-verification\" content=\"nXGC_JxO0yoP1qBzMnYD_xgufO6leSLw1kyNo2HZltM\" />\n\n\t\n\t<script async src=\"https://www.googletagmanager.com/gtag/js?id=G-RC512FD18N\"></script>\n\t<script>\n\t\twindow.dataLayer = window.dataLayer || [];\n\t\tfunction gtag(){dataLayer.push(arguments);}\n\t\tgtag('js', new Date());\n\n\t\tgtag('config', 'G-RC512FD18N');\n\t</script>\n\n\t\n\t<meta name=\"description\" content=\"AtCoder is a programming contest site for anyone from beginners to experts. We hold weekly programming contests online.\">\n\t<meta name=\"author\" content=\"AtCoder Inc.\">\n\n\t<meta property=\"og:site_name\" content=\"AtCoder\">\n\t\n\t<meta property=\"og:title\" content=\"A - Christmas Present\" />\n\t<meta property=\"og:description\" content=\"AtCoder is a programming contest site for anyone from beginners to experts. We hold weekly programming contests online.\" />\n\t<meta property=\"og:type\" content=\"website\" />\n\t<meta property=\"og:url\" content=\"https://atcoder.jp/contests/abc334/tasks/abc334_a\" />\n\t<meta property=\"og:image\" content=\"https://img.atcoder.jp/assets/atcoder.png\" />\n\t<meta name=\"twitter:card\" content=\"summary\" />\n\t<meta name=\"twitter:site\" content=\"@atcoder\" />\n\t\n\t<meta property=\"twitter:title\" content=\"A - Christmas Present\" />\n\n\t<link href=\"//fonts.googleapis.com/css?family=Lato:400,700\" rel=\"stylesheet\" type=\"text/css\">\n\t<link rel=\"stylesheet\" type=\"text/css\" href=\"//img.atcoder.jp/public/4e015d0/css/bootstrap.min.css\">\n\t<link rel=\"stylesheet\" type=\"text/css\" href=\"//img.atcoder.jp/public/4e015d0/css/base.css\">\n\t<link rel=\"shortcut icon\" type=\"image/png\" href=\"//img.atcoder.jp/assets/favicon.png\">\n\t<link rel=\"apple-touch-icon\" href=\"//img.atcoder.jp/assets/atcoder.png\">\n\t<script src=\"//img.atcoder.jp/public/4e015d0/js/lib/jquery-1.9.1.min.js\"></script>\n\t<script src=\"//img.atcoder.jp/public/4e015d0/js/lib/bootstrap.min.js\"></script>\n\t<script src=\"//img.atcoder.jp/public/4e015d0/js/cdn/js.cookie.min.js\"></script>\n\t<script src=\"//img.atcoder.jp/public/4e015d0/js/cdn/moment.min.js\"></script>\n\t<script src=\"//img.atcoder.jp/public/4e015d0/js/cdn/moment_js-ja.js\"></script>\n\t<script>\n\t\tvar LANG = \"en\";\n\t\tvar userScreenName = \"\";\n\t\tvar csrfToken = \"YPW1jeabL6zDWgHrwqdUNcriDFK4QT2crbr2Ag853ro=\"\n\t</script>\n\t<script src=\"//img.atcoder.jp/public/4e015d0/js/utils.js\"></script>\n\t\n\t\n\t\t<script src=\"//img.atcoder.jp/public/4e015d0/js/contest.js\"></script>\n\t\t<link href=\"//img.atcoder.jp/public/4e015d0/css/contest.css\" rel=\"stylesheet\" />\n\t\t<script>\n\t\t\tvar contestScreenName = \"abc334\";\n\t\t\tvar remainingText = \"Remaining Time\";\n\t\t\tvar countDownText = \"Contest begins in\";\n\t\t\tvar startTime = moment(\"2023-12-23T21:00:00+09:00\");\n\t\t\tvar endTime = moment(\"2023-12-23T22:40:00+09:00\");\n\t\t</script>\n\t\t<style>    #contest-statement .lang-ja  hr {\r\n        margin: 50px 0px;\r\n        border: none;\r\n        border-top: 10px dotted #e50050;\r\n    }\r\n    #contest-statement h4 {\r\n        margin-top: 20px;\r\n    }\r\n    .icon {\r\n        height: 1em;\r\n    }\r\n    .snow {\r\n        /*雪の色*/\r\n        color: #206AF4aa;\r\n        /*初期位置*/\r\n        position: fixed;\r\n        top: -5%;\r\n    }\r\n    .snow1st {\r\n        /*雪の大きさ*/\r\n        font-size: 25px;\r\n        animation: s1 6s linear infinite;\r\n    }\r\n    /*2つめの雪アニメーション*/\r\n    .snow2nd{\r\n        /*雪の大きさ*/\r\n        font-size: 20px;\r\n        animation: s2 11s linear infinite;\r\n    }\r\n    .snow3rd {\r\n        /*雪の大きさ*/\r\n        font-size: 30px;\r\n        animation: s3 20s linear infinite;\r\n    }\r\n    \r\n    @keyframes s1 {\r\n        0% {\r\n            left: 50% \r\n        }\r\n        100% {\r\n            top: 150%;\r\n            left: 20% \r\n        }\r\n    }\r\n    @keyframes s2 {\r\n        0% {\r\n            left: 50%\r\n        }\r\n        100% {\r\n            top: 150%;\r\n            left: 80%\r\n        }\r\n    }\r\n    @keyframes s3 {\r\n        0" <> ...,
  headers: [
    {"Content-Type", "text/html; charset=utf-8"},
    {"Content-Length", "18131"},
    {"Connection", "keep-alive"},
    {"Date", "Sat, 30 Dec 2023 01:30:30 GMT"},
    {"Server", "nginx"},
    {"Vary", "Accept-Encoding"},
    {"Cache-Control", "no-cache, no-store, must-revalidate, private"},
    {"Expires", "Fri, 1 Jan 2010 00:00:00 GMT"},
    {"Pragma", "no-cache"},
    {"Set-Cookie", "REVEL_FLASH=; Path=/; HttpOnly; Secure"},
    {"Set-Cookie",
     "REVEL_SESSION=b259743fbff075a1c999a58381d19d4746a5f741-%00csrf_token%3AYPW1jeabL6zDWgHrwqdUNcriDFK4QT2crbr2Ag853ro%3D%00%00_TS%3A1719451830%00; Path=/; Expires=Thu, 27 Jun 2024 01:30:30 GMT; Max-Age=15552000; HttpOnly; Secure"},
    {"X-Content-Type-Options", "nosniff"},
    {"X-Xss-Protection", "1; mode=block"},
    {"X-Cache", "Miss from cloudfront"},
    {"Via", "1.1 52837da9827dd735cd471158bffac49a.cloudfront.net (CloudFront)"},
    {"X-Amz-Cf-Pop", "NRT12-C3"},
    {"X-Amz-Cf-Id", "FvT_VfG4z4-uGNTSbLsYsgdMfFQn-Pa1b3jwS4quuTC89wm4dD2RvA=="},
    {"X-Frame-Options", "SAMEORIGIN"},
    {"Referrer-Policy", "strict-origin-when-cross-origin"},
    {"Strict-Transport-Security", "max-age=31536000"}
  ],
  request_url: "https://atcoder.jp/contests/abc334/tasks/abc334_a",
  request: %HTTPoison.Request{
    method: :get,
    url: "https://atcoder.jp/contests/abc334/tasks/abc334_a",
    headers: [],
    body: "",
    params: %{},
    options: []
  }
}

内容の解析

doc.bodyがサーバからの応答の本体です。これをFlokiで解析します。まず、parse_document()をかけます。

{:ok, html} = Floki.parse_document(doc.body)
html
htmlの内容
[
  {"html", [],
   [
     {"head", [],
      [
        {"title", [], ["A - Christmas Present"]},
        {"meta", [{"http-equiv", "Content-Type"}, {"content", "text/html; charset=utf-8"}], []},
        {"meta", [{"http-equiv", "Content-Language"}, {"content", "en"}], []},
        {"meta", [{"name", "viewport"}, {"content", "width=device-width,initial-scale=1.0"}], []},
        {"meta", [{"name", "format-detection"}, {"content", "telephone=no"}], []},
        {"meta",
         [
           {"name", "google-site-verification"},
           {"content", "nXGC_JxO0yoP1qBzMnYD_xgufO6leSLw1kyNo2HZltM"}
         ], []},
        {"script",
         [{"async", "async"}, {"src", "https://www.googletagmanager.com/gtag/js?id=G-RC512FD18N"}],
         [""]},
        {"script", [],
         ["\n\t\twindow.dataLayer = window.dataLayer || [];\n\t\tfunction gtag(){dataLayer.push(arguments);}\n\t\tgtag('js', new Date());\n\n\t\tgtag('config', 'G-RC512FD18N');\n\t"]},
        {"meta",
         [
           {"name", "description"},
           {"content",
            "AtCoder is a programming contest site for anyone from beginners to experts. We hold weekly programming contests online."}
         ], []},
        {"meta", [{"name", "author"}, {"content", "AtCoder Inc."}], []},
        {"meta", [{"property", "og:site_name"}, {"content", "AtCoder"}], []},
        {"meta", [{"property", "og:title"}, {"content", "A - Christmas Present"}], []},
        {"meta",
         [
           {"property", "og:description"},
           {"content",
            "AtCoder is a programming contest site for anyone from beginners to experts. We hold weekly programming contests online."}
         ], []},
        {"meta", [{"property", "og:type"}, {"content", "website"}], []},
        {"meta",
         [{"property", "og:url"}, {"content", "https://atcoder.jp/contests/abc334/tasks/abc334_a"}],
         []},
        {"meta",
         [{"property", "og:image"}, {"content", "https://img.atcoder.jp/assets/atcoder.png"}], []},
        {"meta", [{"name", "twitter:card"}, {"content", "summary"}], []},
        {"meta", [{"name", "twitter:site"}, {"content", "@atcoder"}], []},
        {"meta", [{"property", "twitter:title"}, {"content", "A - Christmas Present"}], []},
        {"link",
         [
           {"href", "//fonts.googleapis.com/css?family=Lato:400,700"},
           {"rel", "stylesheet"},
           {"type", "text/css"}
         ], []},
        {"link",
         [
           {"rel", "stylesheet"},
           {"type", "text/css"},
           {"href", "//img.atcoder.jp/public/4e015d0/css/bootstrap.min.css"}
         ], []},
        {"link",
         [
           {"rel", "stylesheet"},
           {"type", "text/css"},
           {"href", "//img.atcoder.jp/public/4e015d0/css/base.css"}
         ], []},
        {"link",
         [
           {"rel", "shortcut icon"},
           {"type", "image/png"},
           {"href", "//img.atcoder.jp/assets/favicon.png"}
         ], []},
        {"link", [{"rel", "apple-touch-icon"}, {"href", "//img.atcoder.jp/assets/atcoder.png"}], []},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/lib/jquery-1.9.1.min.js"}], [""]},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/lib/bootstrap.min.js"}], [""]},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/cdn/js.cookie.min.js"}], [""]},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/cdn/moment.min.js"}], [""]},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/cdn/moment_js-ja.js"}], [""]},
        {"script", [],
         ["\n\t\tvar LANG = \"en\";\n\t\tvar userScreenName = \"\";\n\t\tvar csrfToken = \"YPW1jeabL6zDWgHrwqdUNcriDFK4QT2crbr2Ag853ro=\"\n\t"]},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/utils.js"}], [""]},
        {"script", [{"src", "//img.atcoder.jp/public/4e015d0/js/contest.js"}], [""]},
        {"link",
         [{"href", "//img.atcoder.jp/public/4e015d0/css/contest.css"}, {"rel", "stylesheet"}], []},
        {"script", [],
         ["\n\t\t\tvar contestScreenName = \"abc334\";\n\t\t\tvar remainingText = \"Remaining Time\";\n\t\t\tvar countDownText = \"Contest begins in\";\n\t\t\tvar startTime = moment(\"2023-12-23T21:00:00+09:00\");\n\t\t\tvar endTime = moment(\"2023-12-23T22:40:00+09:00\");\n\t\t"]},
        {"style", [],
         ["    #contest-statement .lang-ja  hr {\r\n        margin: 50px 0px;\r\n        border: none;\r\n        border-top: 10px dotted #e50050;\r\n    }\r\n    #contest-statement h4 {\r\n        margin-top: 20px;\r\n    }\r\n    .icon {\r\n        height: 1em;\r\n    }\r\n    .snow {\r\n        /*雪の色*/\r\n        color: #206AF4aa;\r\n        /*初期位置*/\r\n        position: fixed;\r\n        top: -5%;\r\n    }\r\n    .snow1st {\r\n        /*雪の大きさ*/\r\n        font-size: 25px;\r\n        animation: s1 6s linear infinite;\r\n    }\r\n    /*2つめの雪アニメーション*/\r\n    .snow2nd{\r\n        /*雪の大きさ*/\r\n        font-size: 20px;\r\n        animation: s2 11s linear infinite;\r\n    }\r\n    .snow3rd {\r\n        /*雪の大きさ*/\r\n        font-size: 30px;\r\n        animation: s3 20s linear infinite;\r\n    }\r\n    \r\n    @keyframes s1 {\r\n        0% {\r\n            left: 50% \r\n        }\r\n        100% {\r\n            top: 150%;\r\n            left: 20% \r\n        }\r\n    }\r\n    @keyframes s2 {\r\n        0% {\r\n            left: 50%\r\n        }\r\n        100% {\r\n            top: 150%;\r\n            left: 80%\r\n        }\r\n    }\r\n    @keyframes s3 {\r\n        0% {\r\n            left: 5%\r\n        }\r\n        100% {\r\n            top: 150%;\r\n            left: 25%\r\n        }\r\n    }"]},
        {"link",
         [
           {"href", "//img.atcoder.jp/public/4e015d0/css/cdn/select2.min.css"},
           {"rel", "stylesheet"}
         ], []},
        {"link",
         [
           {"href", "//img.atcoder.jp/public/4e015d0/css/cdn/select2-bootstrap.min.css"},
           {"rel", ...}
         ], []},
        {"script", [{"src", ...}], [""]},
        {"script", [{...}], [...]},
        {"script", [...], ...},
        {"script", ...},
        {...},
        ...
      ]},
     {"body", [],
      [
        {"script", [{"type", "text/javascript"}],
         ["\n\tvar __pParams = __pParams || [];\n\t__pParams.push({client_id: '468', c_1: 'atcodercontest', c_2: 'ClientSite'});\n"]},
        {"script",
         [
           {"type", "text/javascript"},
           {"src", "https://cdn.d2-apps.net/js/tr.js"},
           {"async", "async"}
         ], [""]},
        {"div",
         [
           {"id", "modal-contest-start"},
           {"class", "modal fade"},
           {"tabindex", "-1"},
           {"role", "dialog"}
         ],
         [
           {"div", [{"class", "modal-dialog"}, {"role", "document"}],
            [
              {"div", [{"class", "modal-content"}],
               [
                 {"div", [{"class", "modal-header"}],
                  [
                    {"button",
                     [
                       {"type", "button"},
                       {"class", "close"},
                       {"data-dismiss", "modal"},
                       {"aria-label", "Close"}
                     ], [{"span", [{"aria-hidden", "true"}], ["×"]}]},
                    {"h4", [{"class", "modal-title"}], ["Contest started"]}
                  ]},
                 {"div", [{"class", "modal-body"}],
                  [
                    {"p", [],
                     ["UNIQUE VISION Programming Contest 2023 Christmas (AtCoder Beginner Contest 334) has begun."]}
                  ]},
                 {"div", [{"class", "modal-footer"}],
                  [
                    {"button",
                     [{"type", "button"}, {"class", "btn btn-default"}, {"data-dismiss", "modal"}],
                     ["Close"]}
                  ]}
               ]}
            ]}
         ]},
        {"div",
         [
           {"id", "modal-contest-end"},
           {"class", "modal fade"},
           {"tabindex", "-1"},
           {"role", "dialog"}
         ],
         [
           {"div", [{"class", "modal-dialog"}, {"role", "document"}],
            [
              {"div", [{"class", "modal-content"}],
               [
                 {"div", [{"class", "modal-header"}],
                  [
                    {"button",
                     [
                       {"type", "button"},
                       {"class", "close"},
                       {"data-dismiss", "modal"},
                       {"aria-label", "Close"}
                     ], [{"span", [{"aria-hidden", "true"}], ["×"]}]},
                    {"h4", [{"class", "modal-title"}], ["Contest is over"]}
                  ]},
                 {"div", [{"class", "modal-body"}],
                  [
                    {"p", [],
                     ["UNIQUE VISION Programming Contest 2023 Christmas (AtCoder Beginner Contest 334) has ended."]}
                  ]},
                 {"div", [{"class", "modal-footer"}],
                  [
                    {"button",
                     [{"type", "button"}, {"class", "btn btn-default"}, {"data-dismiss", "modal"}],
                     ["Close"]}
                  ]}
               ]}
            ]}
         ]},
        {"div", [{"id", "main-div"}, {"class", "float-container"}],
         [
           {"nav", [{"class", "navbar navbar-inverse navbar-fixed-top"}],
            [
              {"div", [{"class", "container-fluid"}],
               [
                 {"div", [{"class", "navbar-header"}],
                  [
                    {"button",
                     [
                       {"type", "button"},
                       {"class", "navbar-toggle collapsed"},
                       {"data-toggle", "collapse"},
                       {"data-target", "#navbar-collapse"},
                       {"aria-expanded", "false"}
                     ],
                     [
                       {"span", [{"class", "icon-bar"}], []},
                       {"span", [{"class", "icon-bar"}], []},
                       {"span", [{"class", "icon-bar"}], []}
                     ]},
                    {"a", [{"class", "navbar-brand"}, {"href", "/home"}], []}
                  ]},
                 {"div", [{"class", "collapse navbar-collapse"}, {"id", "navbar-collapse"}],
                  [
                    {"ul", [{"class", "nav navbar-nav"}],
                     [
                       {"li", [],
                        [
                          {"a", [{"class", "contest-title"}, {"href", "/contests/abc334"}],
                           ["UNIQUE VISION Programming Contest 2023 Christmas (AtCoder Beginner Contest 334)"]}
                        ]}
                     ]},
                    {"ul", [{"class", "nav navbar-nav navbar-right"}],
                     [
                       {"li", [{"class", "dropdown"}],
                        [
                          {"a",
                           [
                             {"class", "dropdown-toggle"},
                             {"data-toggle", "dropdown"},
                             {"href", "#"},
                             {"role", "button"},
                             {"aria-haspopup", "true"},
                             {"aria-expanded", "false"}
                           ],
                           [
                             {"img", [{"src", "//img.atcoder.jp/assets/top/img/flag-lang/en.png"}],
                              []},
                             " English ",
                             {"span", [{"class", ...}], []}
                           ]},
                          {"ul", [{"class", "dropdown-menu"}],
                           [{"li", [], [{"a", ...}]}, {"li", [], [{...}]}]}
                        ]},
                       {"li", [],
                        [
                          {"a",
                           [
                             {"href",
                              "/register?continue=https%3A%2F%2Fatcoder.jp%2Fcontests%2Fabc334%2Ftasks%2Fabc334_a"}
                           ], ["Sign Up"]}
                        ]},
                       {"li", [],
                        [
                          {"a",
                           [
                             {"href",
                              "/login?continue=https%3A%2F%2Fatcoder.jp%2Fcontests%2Fabc334%2Ftasks%2Fabc334_a"}
                           ], ["Sign In"]}
                        ]}
                     ]}
                  ]}
               ]}
            ]},
           {"form",
            [
              {"method", "POST"},
              {"name", "form_logout"},
              {"action",
               "/logout?continue=https%3A%2F%2Fatcoder.jp%2Fcontests%2Fabc334%2Ftasks%2Fabc334_a"}
            ],
            [
              {"input",
               [
                 {"type", "hidden"},
                 {"name", "csrf_token"},
                 {"value", "YPW1jeabL6zDWgHrwqdUNcriDFK4QT2crbr2Ag853ro="}
               ], []}
            ]},
           {"div",
            [{"id", "main-container"}, {"class", "container"}, {"style", "padding-top:50px;"}],
            [
              {"div", [{"class", "row"}],
               [
                 {"div", [{"id", "contest-nav-tabs"}, {"class", "col-sm-12 mb-2 cnvtb-fixed"}],
                  [
                    {"div", [],
                     [
                       {"small", [{"class", "contest-duration"}],
                        [
                          "\n\t\t\t\n\t\t\t\tContest Duration:\n\t\t\t\t",
                          {"a",
                           [
                             {"href",
                              "http://www.timeanddate.com/worldclock/fixedtime.html?iso=20231223T2100&p1=248"},
                             {"target", "blank"}
                           ],
                           [
                             {"time", [{"class", "fixtime fixtime-full"}],
                              ["2023-12-23 21:00:00+0900"]}
                           ]},
                          " - ",
                          {"a",
                           [
                             {"href",
                              "http://www.timeanddate.com/worldclock/fixedtime.html?iso=20231223T2240&p1=248"},
                             {"target", "blank"}
                           ], [{"time", [{...}], [...]}]},
                          " (local time)\n\t\t\t\t(100 minutes)\n\t\t\t\n\t\t"
                        ]},
                       {"small", [{"class", "back-to-home pull-right"}],
                        [{"a", [{"href", "/home"}], ["Back to Home"]}]}
                     ]},
                    {"ul", [{"class", "nav nav-tabs"}],
                     [
                       {"li", [],
                        [
                          {"a", [{"href", "/contests/abc334"}],
                           [
                             {"span", [{"class", "glyphicon glyphicon-home"}, {"aria-hidden", ...}],
                              []},
                             " Top"
                           ]}
                        ]},
                       {"li", [{"class", "active"}],
                        [
                          {"a", [{"href", "/contests/abc334/tasks"}],
                           [{"span", [{"class", ...}, {...}], []}, " Tasks"]}
                        ]},
                       {"li", [],
                        [
                          {"a", [{"href", "/contests/abc334/clarifications"}],
                           [{"span", [{...}, ...], []}, " Clarifications ", {"span", ...}]}
                        ]},
                       {"li", [],
                        [
                          {"a",
                           [
                             {"class", "dropdown-toggle"},
                             {"data-toggle", "dropdown"},
                             {"href", ...},
                             {...},
                             ...
                           ], [{"span", [...], ...}, " Results", {...}]},
                          {"ul", [{"class", "dropdown-menu"}], [{"li", ...}]}
                        ]},
                       {"li", [],
                        [
                          {"a", [{"href", "/contests/abc334/standings"}],
                           [{"span", ...}, " Standings"]}
                        ]},
                       {"li", [], [{"a", [{"href", ...}], [{...}, ...]}]},
                       {"li", [], [{"a", [{...}], [...]}]},
                       {"li", [], [{"a", [...], ...}]},
                       {"li", [{"class", "pull-right"}], [{"a", ...}]}
                     ]}
                  ]},
                 {"div", [{"class", "col-sm-12"}],
                  [
                    {"span", [{"class", "h2"}],
                     [
                       "\n\t\t\tA - Christmas Present\n\t\t\t",
                       {"a",
                        [
                          {"class", "btn btn-default btn-sm"},
                          {"href", "/contests/abc334/tasks/abc334_a/editorial"}
                        ], ["Editorial"]}
                     ]},
                    {"span", [{"id", "task-lang-btn"}, {"class", "pull-right"}],
                     [
                       {"span", [{"data-lang", "ja"}],
                        [{"img", [{"src", "//img.atcoder.jp/assets/top/img/flag-lang/ja.png"}], []}]},
                       " / ",
                       {"span", [{"data-lang", "en"}],
                        [{"img", [{"src", "//img.atcoder.jp/assets/top/img/flag-lang/en.png"}], []}]}
                     ]},
                    {"script", [],
                     ["\n\t\t\t$(function() {\n\t\t\t\tvar ts = $('#task-statement span.lang');\n\t\t\t\tif (ts.children('span').size() <= 1) {\n\t\t\t\t\t$('#task-lang-btn').hide();\n\t\t\t\t\tts.children('span').show();\n\t\t\t\t\treturn;\n\t\t\t\t}\n\t\t\t\tvar REMEMBER_LB = 5;\n\t\t\t\tvar LS_KEY = 'task_lang';\n\t\t\t\tvar taskLang = getLS(LS_KEY) || '';\n\t\t\t\tvar changeTimes = 0;\n\t\t\t\tif (taskLang == 'ja' || taskLang == 'en') {\n\t\t\t\t\tchangeTimes = REMEMBER_LB;\n\t\t\t\t} else {\n\t\t\t\t\tvar changeTimes = parseInt(taskLang, 10);\n\t\t\t\t\tif (isNaN(changeTimes)) {\n\t\t\t\t\t\tchangeTimes = 0;\n\t\t\t\t\t\tdelLS(LS_KEY);\n\t\t\t\t\t}\n\t\t\t\t\tchangeTimes++;\n\t\t\t\t\ttaskLang = LANG;\n\t\t\t\t}\n\t\t\t\tts.children('span.lang-' + taskLang).show();\n\n\t\t\t\t$('#task-lang-btn span').click(function() {\n\t\t\t\t\tvar l = $(this).data('lang');\n\t\t\t\t\tts.children('span').hide();\n\t\t\t\t\tts.children('span.lang-' + l).show();\n\t\t\t\t\tif (changeTimes < REMEMBER_LB) setLS(LS_KEY, changeTimes);\n\t\t\t\t\telse setLS(LS_KEY, l);\n\t\t\t\t});\n\t\t\t});\n\t\t"]},
                    {"hr", [], []},
                    {"p", [], ["\n\t\t\tTime Limit: 2 sec / Memory Limit: 1024 MB\n\t\t\t\n\t\t"]},
                    {"div", [{"id", "task-statement"}],
                     [
                       {"span", [{"class", "lang"}],
                        [{"span", [{"class", ...}], [{...}, ...]}, {"span", [{...}], [...]}]}
                     ]}
                  ]}
               ]},
              {"hr", [], []},
              {"div",
               [
                 {"class", "a2a_kit a2a_kit_size_20 a2a_default_style pull-right"},
                 {"data-a2a-url", "https://atcoder.jp/contests/abc334/tasks/abc334_a?lang=en"},
                 {"data-a2a-title", "A - Christmas Present"}
               ],
               [
                 {"a", [{"class", "a2a_button_facebook"}], []},
                 {"a", [{"class", "a2a_button_twitter"}], []},
                 {"a", [{"class", "a2a_button_telegram"}], []},
                 {"a", [{"class", "a2a_dd"}, {"href", "https://www.addtoany.com/share"}], []}
               ]},
              {"script", [{"async", "async"}, {"src", "//static.addtoany.com/menu/page.js"}], [""]}
            ]},
           {"hr", [], []}
         ]},
        {"div", [{"class", "container"}, {"style", "margin-bottom: 80px;"}],
         [
           {"footer", [{"class", "footer"}],
            [
              {"ul", [],
               [
                 {"li", [], [{"a", [{"href", "/contests/abc334/rules"}], ["Rule"]}]},
                 {"li", [], [{"a", [{"href", "/contests/abc334/glossary"}], ["Glossary"]}]}
               ]},
              {"ul", [],
               [
                 {"li", [], [{"a", [{"href", "/tos"}], ["Terms of service"]}]},
                 {"li", [], [{"a", [{"href", "/privacy"}], ["Privacy Policy"]}]},
                 {"li", [], [{"a", [{"href", "/personal"}], ["Information Protection Policy"]}]},
                 {"li", [], [{"a", [{"href", "/company"}], ["Company"]}]},
                 {"li", [], [{"a", [{"href", "/faq"}], ["FAQ"]}]},
                 {"li", [], [{"a", [{"href", "/contact"}], ["Contact"]}]}
               ]},
              {"div", [{"class", "text-center"}],
               [
                 {"small", [{"id", "copyright"}],
                  [
                    "Copyright Since 2012 ©",
                    {"a", [{"href", "http://atcoder.co.jp"}], ["AtCoder Inc."]},
                    " All rights reserved."
                  ]}
               ]}
            ]}
         ]},
        {"p", [{"id", "fixed-server-timer"}, {"class", "contest-timer"}], []},
        {"div", [{"id", "scroll-page-top"}, {"style", "display:none;"}],
         [
           {"span", [{"class", "glyphicon glyphicon-arrow-up"}, {"aria-hidden", "true"}], []},
           " Page Top"
         ]}
      ]}
   ]}
]

取得したい値の場所

取得したい値は、h3タグで、値が"入力例"となっている個所の"section"にあります。

image.png

section_list = Floki.find(html, "section")

section_listには、入力例、出力例以外のsectionも含まれてますが、
関係している部分のsectionを抜き出してみます。

  {"section", [], [{"h3", [], ["入力例 1"]}, {"pre", [], ["300 100\n"]}]},
  {"section", [],
   [
     {"h3", [], ["出力例 1"]},
     {"pre", [], ["Bat\n"]},
     {"p", [],
      ["バットの方がグローブより値段が高いので、サンタさんは高橋君にバットをプレゼントします。"]}
   ]},

Enumを使って、フィルタリングと値部分だけに整形してみます。
全体を纏めると以下の通りです。

doc = HTTPoison.get!("https://atcoder.jp/contests/abc334/tasks/abc334_a")
{:ok, html} = Floki.parse_document(doc.body)
section_list = Floki.find(html, "section")
pre_in = 
  section_list
  |> Enum.filter(fn {_,_,[hd|_]} -> String.starts_with?(Floki.text(hd),"入力例")end)
  |> Enum.map(fn {_,_,[_hd|tail]} ->Floki.text(hd(tail)) end)

pre_out = 
  section_list
  |>Enum.filter(fn {_,_,[hd|_]} -> String.starts_with?(Floki.text(hd),"出力例")end)
  |> Enum.map(fn {_,_,[_hd|tail]} ->Floki.text(hd(tail)) end)
{pre_in,pre_out}

入出力得られました

{["300 100\n", "334 343\n"], ["Bat\n", "Glove\n"]}

Floki初めてつかってみましたが、Flokiの機能はシンプルです。
結果の処理はEnumの組み合わせて自在にできるので、楽しいですね。

6
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?