3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Puppeteerで複数のiframeをパースしようとしたときに詰まったこと

Posted at

環境

  • MacBook Pro High Sierra (v10.13.3)
  • node v9.6.1
  • Typescript v2.7.2
  • Puppeteer v1.1.1

概要

  • puppeteerでiframeを取得したいときは Page#frames を使う

コード

test.ts
import {launch} from 'puppeteer';

(async () => {
    const browser = await launch();
    const page = await browser.newPage();

    try {
        await page.goto('file:///tmp/test.html');

        let frames = page.frames();
        // frames[0]は今いるページを指しているため、ページ内のiframeを探す場合インデックスは1以降になる
        for (let i = 1; i < frames.length; i++) {
            const frame = frames[i];
            console.log(await frame.evaluate(content => content.innerHTML, await frame.$('p')));
        }
    } catch (e) {
        console.error(e);
    }

    await page.close();
    await browser.close();
})();
/tmp/test.html
<!doctype html>
<html lang="ja">
<head>
  <meta charset="UTF-8">
  <title>Document</title>
</head>
<body>
  <iframe src="inner-frame.html"></iframe>
  <iframe src="secondary-frame.html"></iframe>
</body>
</html>
/tmp/inner-frame.html
<!DOCTYPE html>
<html lang="ja">
<head>
  <meta charset="UTF-8">
  <title>InnerFrame</title>
</head>
<body>
  <p>Inner Frame!</p>
</body>
</html>
/tmp/secondary-frame.html
<!DOCTYPE html>
<html lang="ja">
<head>
  <meta charset="UTF-8">
  <title>SecondaryFrame</title>
</head>
<body>
  <p>Secondary Frame!</p>
</body>
</html>

実行結果

$ node test.js
Inner Frame!
Secondary Frame!

(蛇足)試行錯誤の跡

事の発端

test.ts
import {launch} from 'puppeteer';

(async () => {
    const browser = await launch();
    const page = await browser.newPage();

    try {
        await page.goto('file:///tmp/test.html');
        const frames = await page.$$('iframe');
        for (let i = 0; i < frames.length; i++) {
            const url = await page.evaluate(content => content.src, frames[i]);
            await page.goto(url);
            console.log(await page.evaluate(content => content.innerHTML, await page.$('p')));
        }
    } catch (e) {
        console.error(e);
    }

    await page.close();
    await browser.close();
})();

実行結果

$ node test.js
Inner Frame!
Error: JSHandles can be evaluated only in the context they were created!
    at ExecutionContext.convertArgument (~~~/node_modules/puppeteer/lib/ExecutionContext.js:95:17)
    at Array.map (<anonymous>)
    at ExecutionContext.evaluateHandle (~~~/node_modules/puppeteer/lib/ExecutionContext.js:70:23)
    at ExecutionContext.evaluate (~~~/node_modules/puppeteer/lib/ExecutionContext.js:46:31)
    at Frame.evaluate (~~~/node_modules/puppeteer/lib/FrameManager.js:299:20)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:160:7)

:thinking: なんかエラー出てる…

第一次改修

test.ts
import {launch} from 'puppeteer';

(async () => {
    const browser = await launch();
    const page = await browser.newPage();

    try {
        await page.goto('file:///tmp/test.html');

        const frames = await page.$$('iframe');
        const urls = [];
        for (let i = 0; i < frames.length; i++) {
            urls.push(await page.evaluate(content => content.src, frames[i]));
        }
        for (let i = 0; i < urls.length; i++) {
            await page.goto(urls[i]);
            console.log(await page.evaluate(content => content.innerHTML, await page.$('p')));
        }
    } catch (e) {
        console.error(e);
    }

    await page.close();
    await browser.close();
})();

実行結果

$ node test.js
Inner Frame!
Secondary Frame!

:thinking: < ループ2回させるのも…
ということで却下

3
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?