Last verified April 2026
> playwright + ai
/ the stack that actually ships
Playwright + AI is the highest-velocity path in 2026 for teams who want AI-assisted testing without buying a new vendor. The full stack: GitHub Copilot (or Claude) for authoring, Playwright MCP for browser-observed codegen, Playwright Healer for auto-repair, TestDino MCP for centralised reporting. We describe what works, what breaks, and when to pay for QA Wolf instead.
> anatomy of the stack
Authoring layer: GitHub Copilot (or Cursor, Claude Code)
The LLM assists while you write Playwright test code in your editor. Copilot suggests test structures, fills in locators, and completes assertion patterns based on your app's type signatures and existing test files. This is the minimum viable AI-in-Playwright stack -- every team using Copilot already has this. It is Level 1 on our agentic capability ladder.
$10-19/user/month (Copilot)
Generation layer: Playwright MCP
Playwright MCP lets the LLM drive a real Chromium browser. Instead of guessing at locators, the LLM navigates to your app, clicks elements, observes the real DOM, and generates Playwright code from actual browser state. This produces more reliable locators than pure editor-based generation. The output is standard Playwright code you own.
Free (requires Copilot or Claude subscription)
Repair layer: Playwright Healer
Playwright Healer intercepts locator failures in existing tests and attempts LLM-driven repair. When a test fails because a selector broke after a UI change, Healer inspects the current DOM, finds the element by semantic meaning rather than exact selector, patches the test file, and re-runs. This reduces the maintenance burden of a large Playwright suite significantly.
Open-source (compute cost of LLM calls)
Reporting layer: TestDino MCP
TestDino MCP adds centralised failure classification and reporting. It connects to your Playwright test results, classifies failures as selector-issues (route to Healer), genuine regressions (route to developer), or flakes (route to retry logic), and provides a structured dashboard. This is the observability layer on top of the generation and repair layers.
TestDino subscription (pricing custom)
> setup walkthrough (playwright mcp)
Step 1: Install prerequisites
node --version # need 18+ npx playwright install chromium
Step 2: Install Playwright MCP
npx @playwright/mcp@latest --help
Step 3: Configure MCP in VS Code (Copilot)
// .vscode/mcp.json
{
"servers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}Step 4: Prompt Copilot to generate a test
// In Copilot Chat: // "Navigate to localhost:3000/login, enter test@test.com // and password123, click login, then verify the dashboard // page loads and contains the user's name. Generate a // Playwright test for this flow."
Step 5: Review and commit
# Review generated locators against your actual DOM # Run: npx playwright test # Fix any locator mismatches before committing
Full documentation: Microsoft Learn - Playwright MCP
> failure modes
The LLM generates a selector for an element it infers should exist based on context, but the selector does not match any real DOM element. The test compiles and runs (finding nothing to click passes implicitly), but catches nothing. Always run 'playwright test --debug' on generated tests to verify elements are found.
Very large SPAs with 200+ components exceed the LLM context window. The MCP DOM snapshot is truncated, causing the agent to miss elements below the fold or inside complex component trees. Workaround: scope each MCP session to a single page or flow, not the entire application.
If a UI redesign moves an element to a structurally different DOM location (not just a class change), Healer may patch the locator with a semantically-close-but-wrong element. Always review Healer patches in staging before merging to main.
Playwright MCP does not natively handle OAuth redirects, multi-tab sessions, or browser-native dialogs without additional configuration. For auth-heavy apps, you need a storageState fixture and a pre-auth MCP session.
> who this works for
Best fit
- + Teams already on GitHub Copilot or Claude
- + Playwright-first engineering orgs
- + Teams who want zero new vendor lock-in
- + Startups under 50 engineers
- + Platform teams building internal testing tooling
Consider QA Wolf instead if
- - Suite is over 500 Playwright tests and flake rate is above 5%
- - Team cannot spare engineer time for test supervision
- - Need 24/7 monitoring with human escalation
- - Budget supports $50k+/year managed service
- - Compliance requires formal QA signoff
> faq