Question 1

What is agentic testing?

Accepted Answer

Agentic testing is LLM-driven test design, autonomous execution, and self-repair without human supervision during a run. We define it on a five-level capability ladder: Level 0 is traditional scripts; Level 1 is LLM-assisted authoring (Copilot); Level 2 is prompt-to-test generation (Playwright MCP); Level 3 is autonomous plan-run-heal without mid-run human input (QA Wolf, Momentic); Level 4 is cross-suite strategy design; Level 5 is fully autonomous test maintenance across releases (still a research problem).

Question 2

What is Playwright MCP and why does it matter?

Accepted Answer

Playwright MCP is a Model Context Protocol server published by Microsoft that lets LLMs (GitHub Copilot, Claude, GPT-4o) drive a real Chromium browser during test generation. The LLM can click, type, observe DOM state, and generate Playwright test code based on actual browser interactions. This is the most practical Level 2 bridge available in April 2026 -- it costs only a Copilot subscription and produces real Playwright code you own.

Question 3

What is the difference between Level 2 and Level 3 agentic testing?

Accepted Answer

Level 2 (prompt-to-test generation) requires a human to describe the test scenario, review the output, and trigger the run. The LLM generates; the human validates. Level 3 (autonomous run-and-heal) requires no mid-run human input. The agent plans a test, runs it, observes failures, repairs selectors or logic, and re-runs -- all without a human in the loop. QA Wolf and Momentic operate at Level 3. The distinction matters because Level 2 still requires QA headcount to supervise; Level 3 reduces headcount.

Question 4

What is the Playwright Healer and how does it work?

Accepted Answer

Playwright Healer is an experimental agent layer that sits above a Playwright test suite and automatically repairs failing locators when UI changes break them. When a test fails with a locator error, Healer uses an LLM to inspect the current DOM, identify the most likely new selector for the target element, patch the test, and re-run. It is distinct from Playwright MCP (which generates new tests) -- Healer maintains existing tests.

Question 5

What would a Level 5 agentic testing system look like?

Accepted Answer

A Level 5 system would autonomously maintain an entire test strategy across software releases: detecting when new features require new tests, retiring stale tests, rebalancing the suite for coverage gaps, and generating new tests for regression risks identified from production error logs. TAM-Eval (SANER 2026) is the nearest academic benchmark for this capability. No commercial tool is at Level 5 as of April 2026.

Tool	Capability level	Note
QA Wolf	L3	Full agentic Playwright output, autonomous run-repair cycle.
Momentic	L3	Goal-to-test, autonomous with proprietary format.
testRigor	L2-L3	L2 plain-English generation, L3 with Vision AI mode enabled.
Mabl	L2-L4	L2 codeless authoring, L3 auto-healing, L4 partial suite design.
Copilot + Playwright MCP	L2	Prompt-driven browser-observed generation. Human review required.
Diffblue Cover	L2	RL-driven source-to-JUnit generation. No E2E or run-repair loop.
Qodo	L1-L2	L1 in IDE suggestion mode, L2 in full generation mode.
Testim	L2	Codeless recorder with LLM heal layer. No autonomous planning.
Meticulous	L2	Trace capture is automated, but replay analysis requires human review.

> llm test automation
/ agentic, properly defined

Traditional automation

LLM-assisted authoring

Prompt-to-test generation

Autonomous plan-run-heal

Cross-suite strategy design

Autonomous release-spanning maintenance

When MCP breaks

> llm test automation/ agentic, properly defined