6
8

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Rust html解析 スクレイピング

Last updated at Posted at 2018-04-19

scraper

神スクレイピングライブラリ

宣言

extern crate scraper;
use scraper::{Selector, Html};

htmlの用意

let html = r#"<html>...</html>#";

htmlのパース

let document = Html::parse_document(html);

cssセレクタの用意

let css = "head";

cssセレクタのパース

let selector = Selector::parse(css).unwrap();

スクレイピング

for node in document.select(&selector) {
    //処理
}

node

・文字列として取り出す

node.value()

・属性を取得

node.value().attr("content").unwrap()

for node in document.select(&selector) {
    let content = node.value().attr("content").unwrap_or("");
    let i = content.find("charset=");
    let charset = match i {
        Some(i) => {
            &content[i+8..]
        },
        _ => ""
    };
    println!("{}", charset);
}

html取得

こちらを参照ください。

エンコーディング

こちらを参照ください。

6
8
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
8

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?