2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

playwright メモ

Last updated at Posted at 2024-10-25

公式

playwrightのよい点

  • webdriverが不要
    • ブラウザとドライバのバージョン違いに煩わされない
  • 基本的には明示的なwaitを入れる必要がない
    • ただし、ページや処理によっては必要となるときもある
  • コード自動生成機能がある
    • そのままでは使えないことが多いが、コード書き始めのとっかかりに使えることもある

  • python対話モードでweb操作ができる
    • こまごまと操作の確認ができる

playwrightのびみょうな点

  • seleniumに比べると情報が少ない
    • ただし、公式ドキュメントを見ればたいていのことは何とかなる

PCにインストール済みのブラウザを使う

公式ドキュメントには playwright install でplaywright実行用ブラウザをインストールするくだりがあるが、 playwright install をしなくても使える
image.png

インストールした場合の保存先

%USERPROFILE%\AppData\Local\ms-playwright on Windows
~/Library/Caches/ms-playwright on macOS
~/.cache/ms-playwright on Linux

インストール済みのブラウザの使い方

playwright起動時引数に使用するブラウザを指定する

sync
browser = playwright.chromium.launch(channel="chrome")
async
browser = await playwright.chromium.launch(channel="chrome")

python対話モードで実行

playwrightの起動・終了のみをpythonの対話モードで行い、他はすべてブラウザを直接操作することも可能
要素の操作だけをコードで確認したい時などに便利

sync
# 対話モード起動
python

# playwright起動
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.chromium.launch(channel="chrome", headless=False)
page = browser.new_page()

# ページ遷移
# ブラウザを手動操作してページ遷移し、遷移先ページをコード入力で操作することもできる
page.goto("https://playwright.dev/")
page.screenshot(path="example.png")

# playwright終了
browser.close()
playwright.stop()
async
# asyncで対話モード起動
python -m asyncio

# playwright起動
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(channel="chrome", headless=False)
page = await browser.new_page()

# ページ遷移
# ブラウザを手動操作してページ遷移し、遷移先ページをコード入力で操作することもできる
await page.goto("https://playwright.dev/")
await page.screenshot(path="example.png")

# playwright終了
await browser.close()
await playwright.stop()

クラス化したいとき

Interactive modeでplaywrightを起動
sync_playwright().start()コールで取得したSyncPlaywrightをクラスメンバとして保持

sync
from playwright.sync_api import sync_playwright

class Crawler:
    def ___init__():
        pass

    # playwrightの起動
    def start(self):
        self.playwright = sync_playwright().start()
        self.browser = self.playwright.chromium.launch(channel="chrome", headless=False)
        self.page = self.browser.new_page()

    # playwrightの終了
    def close(self):
        self.page.close()
        self.browser.close()
        self.playwright.stop()

crawler = Crawler()
crawler.start()
# ページを開いてスクリーンショットを撮る
crawler.page.goto("https://playwright.dev/")
crawler.page.screenshot(path="example.png")
crawler.close()

並列

async
import asyncio

from playwright.async_api import async_playwright


class Crawler:
    def __init__(self):
        self.playwright = None
        self.browser = None

    # Playwrightの起動
    async def start(self):
        self.playwright = await async_playwright().start()
        self.browser = await self.playwright.chromium.launch(
            channel="chrome", headless=False
        )

    # ページを開いてスクリーンショットを撮る
    async def save_screenshot(self, url, screenshot_path):
        page = await self.browser.new_page()
        await page.goto(url)
        await page.screenshot(path=screenshot_path)
        await page.close()

    # Playwrightの終了
    async def close(self):
        await self.browser.close()
        await self.playwright.stop()


async def main():
    urls = [
        ("https://playwright.dev/", "playwright.png"),
        ("https://www.python.org/", "python.png"),
    ]

    crawler = Crawler()
    await crawler.start()

    # 並列実行
    async with asyncio.TaskGroup() as tg:
        for url, path in urls:
            tg.create_task(crawler.save_screenshot(url, path))

    await crawler.close()


asyncio.run(main())

with構文使用での通常起動時と、Interactive modeの違い

通常起動
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("http://playwright.dev")
    print(page.title())
    browser.close()

sync_playwright()で作っているのはPlaywrightContextManager
with文内でコンテキストマネージャーの__enter__()が走り、SyncPlaywrightが返却される
with文をぬけるときに__exit__()が走り、切断処理が実施される

Interactive mode
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
# Use playwright.chromium, playwright.firefox or playwright.webkit
# Pass headless=False to launch() to see the browser UI
browser = playwright.chromium.launch()
page = browser.new_page()
page.goto("https://playwright.dev/")
page.screenshot(path="example.png")
browser.close()
playwright.stop()

sync_playwright().start()が作っているのはSyncPlaywright
start()内で__enter__()が走り、SyncPlaywrightが返却される
以降、明示的なstop()コールを実施するまでSyncPlaywrightを使用できる
stop()コールで__exit__()が走り、切断処理が実施される

実行ファイル化したいとき

Nuitka

--include-package-data=playwright を使用

windows
nuitka ^
--standalone ^
--include-package-data=playwright ^
src.py
mac
python -m nuitka \
--macos-create-app-bundle \
--standalone \
--include-package-data=playwright \
src.py

ver 2.1以降
windows実行ファイルではplaywright処理部分が実行できない
mac実行ファイルでは実行できる

cx_Freeze

build_exe_options = {"includes": ["playwright"]} を使用

setup.py
from cx_Freeze import setup, Executable

build_exe_options = {
    "includes": ["playwright"],
}

setup(
    name="実行ファイルの名前を指定",
    version="0.1",
    options={"build_exe": build_exe_options},
    executables=[Executable("src.py", base="gui")],
)

セッションをつかう

実行毎の多段階認証ログインがめんどうなときなど

from playwright.sync_api import sync_playwright

# セッションを保存
def save_session():
    with sync_playwright() as p:
        browser = p.chromium.launch(channel="chrome", headless=False)
        context = browser.new_context()
        page = context.new_page()

        # === 対象サイトに遷移してログイン処理など ===

        # ログイン後にセッションを保存
        context.storage_state(path="session.json")
        browser.close()

# セッションを使用
def load_session():
    with sync_playwright() as p:
        browser = p.chromium.launch(channel="chrome", headless=False)
        # 保存したセッションを使用
        context = browser.new_context(storage_state="session.json")
        page = context.new_page()
        
        # === 必要な処理を実施 ===

        browser.close()

起動済みchromeをつかう

手動操作と自動操作の両方が必要なときなど

import subprocess
from contextlib import contextmanager
from pathlib import Path

from playwright.sync_api import sync_playwright

# chrome起動
@contextmanager
def open_chrome():
    url = "https://playwright.dev/"

    chrome_path = Path(r"C:\Program Files\Google\Chrome\Application\chrome.exe")
    user_dir_path = Path().cwd().joinpath("user_dir")
    options = f"--remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir={user_dir_path}"

    # chromeを起動
    command = f"{chrome_path} {url} {options}"
    chrome_process = subprocess.Popen(command)

    try:
        yield
    finally:
        # chromeを終了
        chrome_process.terminate()


with open_chrome():
    # chrome接続
    with sync_playwright() as p:
        # 起動済みchromeに接続
        browser = p.chromium.connect_over_cdp("http://localhost:9222")
        default_context = browser.contexts[0]
        page = default_context.pages[0]
        print(page.url)

        browser.close()  # Playwrightで管理するリソースを解放

cmdで起動したchromeでもよい

"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir=%CD%\user_dir

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?