LLMとPlaywrightで自動テストの検証

Posted at 2025-09-21

背景

LLMを活用して自動E2Eテストの記事がすでにたくさんあるけど、自分がまだやったことがないので、少し検証したいです。

特にコードレビューとテストはある程度基準があるはずし、逆に人的な主観判断要素をなくしたくて、まずサポートとしてPlaywrightを入れても損がないと思います。

ローカルでテスト

準備

用意するのは非常に簡単なTODOアプリです。

実装の流れとしては、テキスト記述からテストスクリプトに変換するgen-tests.mjs

#!/usr/bin/env node
import fs from 'node:fs/promises';
import path from 'node:path';
import process from 'node:process';
import { fileURLToPath } from 'node:url';
import OpenAI from 'openai';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

async function main() {
  const rootDir = path.resolve(__dirname, '..');
  const descriptionsPath = path.join(rootDir, 'test-descriptions.md');
  const outputDir = path.join(rootDir, 'e2e');
  await fs.mkdir(outputDir, { recursive: true });

  const apiKey = process.env.OPENAI_API_KEY;
  if (!apiKey) {
    console.error('OPENAI_API_KEY is not set.');
    process.exit(1);
  }

  let descriptions;
  try {
    descriptions = await fs.readFile(descriptionsPath, 'utf8');
  } catch (err) {
    console.error(`Could not read ${descriptionsPath}. Create it with your test case descriptions.`);
    process.exit(1);
  }

  const client = new OpenAI({ apiKey });

  const systemPrompt = `You are a coding assistant that writes Playwright tests using @playwright/test.
Follow these rules:
- Output ONLY a single valid TypeScript file content, no markdown code fences.
- Import { test, expect } from '@playwright/test'.
- Use test.describe('Generated', ...) (do NOT use Mocha's describe).
- Use baseURL via page.goto('/') and avoid hard-coded hostnames.
- Prefer robust locators: getByRole/getByLabel/getByPlaceholder over CSS selectors.
- For this app specifically:
  - The text field has placeholder 'Add a task...'.
  - The submit button's accessible name is 'Add'.
  - Todo items are rendered as list items; use getByRole('listitem').
  - The delete button's accessible name is 'Delete'.
  - Each checkbox is labelled by the todo text; completed state adds 'line-through' class on the label.
  - Avoid selectors like input[name="todo"] or button.delete or any nonexistent class names.
- Name the file 'generated.spec.ts'.`;

  const userPrompt = `Convert the following natural-language test scenarios into Playwright tests. Keep them minimal but robust.\n\nSCENARIOS:\n${descriptions}`;

  console.log('Generating Playwright tests from descriptions...');

  const response = await client.chat.completions.create({
    model: process.env.OPENAI_MODEL || 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt }
    ],
    temperature: 0.2,
  });

  const content = response.choices?.[0]?.message?.content ?? '';
  if (!content.trim()) {
    console.error('No content generated from OpenAI.');
    process.exit(1);
  }

  const outputFile = path.join(outputDir, 'generated.spec.ts');
  await fs.writeFile(outputFile, content, 'utf8');
  console.log(`Wrote ${outputFile}`);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Playwrightでアプリを起動したり・パラメータを設定したりplaywright.config.ts

import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: 'e2e',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 2 : undefined,
  reporter: [['html', { open: 'never' }]],
  use: {
    baseURL: process.env.PLAYWRIGHT_BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',
    headless: true,
  },
  webServer: {
    command: process.env.CI ? 'npm run start' : 'npm run dev -- --port 3000',
    port: 3000,
    reuseExistingServer: !process.env.CI,
    timeout: 120 * 1000,
  },
});

そしてテストケースをtest-descriptions.mdに書く

# E2E Test Scenarios

- Add a todo item and verify it appears in the list.
- Toggle a todo item to completed and verify its style/state.
- Delete a todo item and ensure it is removed.
- Persist todos across reloads using localStorage.

実行

Playwrightをインストールしてローカルでnpm run gen:tests実行したら、テストスクリプトが生成されます。

そのままnpx playwright testでローカルでテスト実行、実行結果はplaywright-report/index.htmlに保存されて、最後にnpx playwright show-reportで結果を確認します。

もちろん、エラーとなった部分のログをAIに投げて、FIXしてくれます。そうすると今度は1個エラーを解決しました。

GitHub Workflowsと連携

GitHub Workflowsに組み込む場合は、ymlファイルを書く

name: E2E - Generate and Run Playwright

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  e2e:
    runs-on: ubuntu-latest
    container: mcr.microsoft.com/playwright:v1.47.0-jammy
    timeout-minutes: 30
    env:
      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      OPENAI_MODEL: gpt-4o-mini
      PLAYWRIGHT_BASE_URL: http://127.0.0.1:3000

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Install dependencies
        working-directory: todo-app
        run: npm ci

      - name: Build app
        working-directory: todo-app
        run: npm run build

      - name: Generate tests from descriptions using OpenAI
        working-directory: todo-app
        run: npm run gen:tests

      - name: Run Playwright tests
        working-directory: todo-app
        run: npx playwright test --reporter=html

      - name: Upload Playwright HTML report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: todo-app/playwright-report
          retention-days: 7

+α

さらに以下のようにPlaywrightの設定ファイルに追加すると、

video: process.env.CI ? 'on-first-retry' : 'on'

テスト動画も見れます。もちろん自動操作なので入力とかは一瞬で終わるけど、テキストよりもっとわかりやすいはずです。

感想

上記のデバッグプロセスもAIに任せたら、全自動化の夢も見えてきます。

ただしアプリが巨大になると、スクリプトの生成には時間がかかるので、ログインみたいな共通部分は事前にスクリプトを置く、ページ単位でユニットテストを書くことも必要でしょう。

それにしても、毎回スクリプトの生成（中〜大規模のアプリは絶対にgpt-4o-miniがいけない）とテスト実行のためのリソースがどこまで増えるのか、また検証が必要です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up