More than 5 years have passed since last update.

Node.js(cheerio) を使ってスクレイピング

Last updated at 2019-01-13Posted at 2019-01-13

この記事の目的

映画.comさんより、公開中の映画のタイトルを20件取得していきます。

開発環境

MacOS Mojave v10.14.2

使い方

cheerioの公式ページによれば、下記3つの特徴があります。

❤ Familiar syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.

ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient. Preliminary end-to-end benchmarks suggest that cheerio is about 8x faster than JSDOM.

❁ Incredibly flexible: Cheerio wraps around @FB55's forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document.

ようするに、

1. jQueryライクに記述可能

2. 動作が早い、例えばJSDOMという別のライブラリと比較すると8倍程度

3. ほぼ全てのHTML/XMLドキュメントをパース可能

ようです。
それでは、cheerioをインストールしてスクレイピングしていきます。

セットアップ

cheerioという、スクレイピング用のライブラリをインストールします。

$ npm init -y
$ npm i request cheerio

デモ

下記コードを実行すると、取得した映画のタイトルを配列にして表示してくれます。

getmovies.js

const request = require('request')
const cheerio = require('cheerio')
const url = 'https://eiga.com/now/all/rank/' // 映画.comランキングページ
const titles_arr = []

request(url, (e, response, body) => {
    if (e) {
        console.error(e)
    }
    try {
        const $ = cheerio.load(body)              //bodyの読み込み
        $('h3', '.m_unit' ).each((i, elem) => {   //'m_unit'クラス内のh3タグ内要素に対して処理実行
            titles_arr[i] = $(elem).text()        //配列にタイトルを挿入していく
        })
        console.log(titles_arr)

     } catch (e) {
         console.error(e)
     }
})

終わりに

以上で、欲しいテキストをスクレイピングできました。

参考記事

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up