7
14

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

puppeteer でスクレイピングするサーバを Now を使って作る方法

Posted at

Now(ver 2)で Doker を使わずに puppeteer が動くようになりました。
Serverless Chrome via Puppeteer with Now 2.0

この記事では puppeteer を使ってスクレイピングした結果を返す API を作りました

注意

2019 年 1 月 29 日現在、yarnだとデプロイに失敗します。(issue はまだないかも)
使うときはnpmを使ってpackage-lock.jsonがある状態でデプロイしてみてください。

全体のソース

ざっくりとしたソースを載せておきます

package.json

puppeteerの代わりにpuppeteer-corechrome-aws-lambdaを使います

{
  "name": "",
  "version": "1.0.0",
  "main": "server.js",
  "license": "MIT",
  "scripts": {
    "start": "node server.js"
  },
  "dependencies": {
    "body-parser": "^1.18.3",
    "chrome-aws-lambda": "^1.11.2",
    "express": "^4.16.4",
    "express-validator": "^5.3.1",
    "puppeteer-core": "^1.11.0"
  }
}

now.json

buildsconfigmaxLambdaSizeを指定しておきます

{
  "version": 2,
  "name": "<アプリ名>",
  "alias": "<必要であれば>",
  "builds": [
    {
      "src": "server.js",
      "use": "@now/node-server",
      "config": { "maxLambdaSize": "40mb" }
    }
  ],
  "routes": [{ "src": "/(.+)", "dest": "/server.js" }]
}

server.js

// Express
const express = require('express');
const bodyParser = require('body-parser');
const app = express();

// Express-validator
const { check, validationResult } = require('express-validator/check');
const { matchedData } = require('express-validator/filter');

const { getNewBooksList } = require('./chromium');

// Support parsing of application/x-www-form-urlencoded post data
app.use(bodyParser.urlencoded({ extended: true }));

app.get(
  '/',
  [check('type').exists(), check('ndc').exists()],
  async (req, res) => {
    const errors = validationResult(req);
    if (!errors.isEmpty())
      return res.status(422).json({ errors: errors.mapped() });

    const valid = matchedData(req);
    const type = valid.type;
    const ndc = valid.ndc;

    const booksList = await getNewBooksList(type, ndc);

    res.json(booksList);
  }
);

app.listen(3000, () => console.log('App listening on port 3000!'));

module.exports.app = app;

chromium.js

puppeteer.launchのオプションの指定が異なるので注意

const chrome = require('chrome-aws-lambda');
const puppeteer = require('puppeteer-core');

async function getNewBooksList(bookType, bookNdc) {
  const browser = await puppeteer.launch({
    args: chrome.args,
    executablePath: await chrome.executablePath,
    headless: chrome.headless
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 1280, height: 800 });

  //各自の処理を入れてください

  await browser.close();

  return booksList;
}

module.exports = { getNewBooksList };

あとがき

公式のサンプルはこちら
now-examples/puppeteer-screenshot/

Now を使うと無料で動かせるので、簡単なスクレイピング API をデプロイするなら選択肢の一つになるかと思います。

7
14
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
14

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?