More than 3 years have passed since last update.

[Python]Playwrightを利用したWebページのPDF保存方法メモ

Posted at 2022-03-09

Python + Playwrightを利用して複数のWebページにアクセスし、それらのページをPDF保存する方法をメモする。
- ubuntu 20.04 on wsl2環境で実施。
- Seleniumで試した時のものをPlaywrightで書き直した。

事前準備

ライブラリインストール

pip install playwright
python3 playwright install
playwright install-deps

処理フロー

1. 以下のような対象WebページのURL一覧(urls.txt)を用意する。

   https://hogehoge.com/hoge.html
   https://fugafuga.com/fuga.html
   ...

2. 上記ファイルに記載のURLにアクセスする。

3. アクセスしたURLをPDF保存する。

4. 2，3を繰り返す。

コード

test.py

from playwright.sync_api import sync_playwright

# 保存対象URL一覧取得
urls = []
with open('urls.txt', mode='rt', encoding='utf-8') as f:
    urls = f.readlines()

# 対象URL一覧にアクセスし、PDF保存
with sync_playwright() as p:
    browser = p.chromium.launch()
    for i,url in enumerate(urls):
        context = browser.new_context()
        page = context.new_page()
        page.goto(url)
        page.pdf(path=f"{i}.pdf")
    browser.close()

動作確認

実行
```
python3 test.py
```
実行結果
- 1.pdf、2.dpfのようなファイル名で、test.pyと同一フォルダにpdfが保存される。

参考情報

Playwright

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

[Python]Playwrightを利用したWebページのPDF保存方法 メモ

事前準備

処理フロー

コード

動作確認

参考情報

[Python]Playwrightを利用したWebページのPDF保存方法メモ