Question 1

What is an AI tester?

Accepted Answer

In 2026 the phrase most often refers to a category of software that uses machine learning or large language models to generate, execute, or maintain tests. It does not commonly refer to a human role; the closest human role is test engineer or quality engineer.

Question 2

Can AI replace manual testers?

Accepted Answer

Industry surveys, including Capgemini's World Quality Report 2025-26, describe AI as augmenting rather than replacing manual testers: 89% of organisations are piloting or deploying generative AI in quality engineering, but only 15% have scaled it enterprise-wide. Most teams in 2026 use AI for test generation and bug triage, but retain manual exploratory testing for new features and high-risk paths.

Question 3

How does AI test generation work?

Accepted Answer

Two paradigms compete. Reinforcement-learning search (Diffblue Cover, JVM only) explores candidate inputs and produces JUnit tests with high mutation score. LLM prompting (Qodo, GitHub Copilot, Tabnine) prompts a language model with code and asks for test code; output covers more languages but varies in quality.

Question 4

What is the best AI testing tool?

Accepted Answer

There is no general answer. The right tool depends on the test category (unit, end-to-end, visual), the codebase (JVM, .NET, Node, Python), and the team's existing test stack. The category overview maps tools to jobs.

Question 5

How much does AI testing cost?

Accepted Answer

Pricing models vary across the category: per-user, per-test-run, per-snapshot, custom enterprise. Direct comparison requires normalising to a common unit. For specific vendor pricing, see the pricing comparison page; each row links to the vendor's published pricing page.

Question 6

What is mutation score?

Accepted Answer

Mutation testing introduces small synthetic changes (mutants) into source code and re-runs the test suite. Mutation score is the proportion of mutants the suite catches. It measures assertion strength, unlike line coverage which only measures execution. The MuTAP paper applies the methodology to LLM-generated tests.

Question 7

What is self-healing test automation?

Accepted Answer

Tests or test runners that recover when a primary locator (CSS selector, XPath) stops resolving by falling back to alternative identifiers (text, role, accessibility label, multi-attribute fingerprint). Mabl, Testim, Functionize, and Healenium occupy the category.

Question 8

What is agentic testing?

Accepted Answer

End-to-end test automation in which an LLM agent reads a goal or natural-language scenario and drives a real browser. The agent decides actions at run time. QA Wolf, testRigor, and Momentic occupy the category, with different choices about whether the test artefact is portable Playwright code or vendor-managed metadata.

Question 9

Does GitHub Copilot write tests?

Accepted Answer

Yes. GitHub Copilot supports test generation through in-editor suggestions, chat prompts, and agent-mode test sessions. Output is plain test code in the project's chosen framework. The published failure mode is hallucinated assertions or tests against APIs that do not exist.

Question 10

Is testRigor better than Selenium?

Accepted Answer

They occupy different categories. testRigor ingests plain-English steps and resolves them at run time; Selenium executes scripted automation against a webdriver. Many teams use both: Selenium or Playwright as the runner, testRigor as the authoring layer.

Question 11

Which AI testing tool for Java?

Accepted Answer

Diffblue Cover is the principal RL-based unit-test generator for JVM languages. For LLM-based generation in Java, Qodo Cover, GitHub Copilot, and JetBrains AI Assistant are all options. End-to-end Java testing typically uses Playwright Java or Selenium with an AI augmentation layer.

Question 12

Which AI testing tool for Playwright?

Accepted Answer

GitHub Copilot generates Playwright code from inside the editor. Microsoft's Playwright MCP server lets Claude or Cursor drive a real browser through Playwright. QA Wolf and Reflect generate Playwright code as a managed service. See the Playwright AI page.

Question 13

What is Playwright MCP?

Accepted Answer

Microsoft's Model Context Protocol server for Playwright, exposing browser automation as MCP tools an LLM client can call. Open source, available on GitHub at microsoft/playwright-mcp. Lets any MCP-compatible client drive a real browser without writing custom integration code.

Question 14

Is AI testing better than traditional automation?

Accepted Answer

AI is generally better at generation and at maintenance (self-healing), and unproven for full-stack reasoning across complex flows. Most production teams in 2026 run a hybrid: AI for the parts AI is good at, scripted automation for the rest.

Question 15

What mutation score should AI-generated tests achieve?

Accepted Answer

There is no industry-standard threshold. Diffblue's 2025 vendor study (linked from the unit-test-generation page) reports specific mutation scores for Cover and several LLM-based generators on Apache Tika and similar repositories; readers should consult that study directly for current numbers. Higher mutation score is better, but the absolute threshold a team should target depends on the codebase risk profile and historical bug-escape rate.

Question 16

How accurate are vendor-published benchmarks?

Accepted Answer

Vendor benchmarks should be read with the framing that the vendor chose the test methodology and the comparison set. Diffblue's 2025 study, for example, is rigorous on mutation score but measures only the vendors and repositories Diffblue selected. Peer-reviewed work (the MuTAP paper) and open benchmarks (SWE-Bench, HELM) are less subject to selection bias.

Question 17

What is the oracle problem in AI testing?

Accepted Answer

The challenge of deciding what the correct behaviour of a system under test should be. Generators can produce many candidate tests, but without a clear oracle, tests may pass on incorrect behaviour. Mutation testing partially addresses this by measuring whether tests catch synthetic bugs.

Question 18

Can I export tests from a vendor-managed AI testing tool?

Accepted Answer

It depends on the vendor. Tools that emit Playwright or Selenium code (QA Wolf, certain Reflect configurations) are portable: tests run independently of the vendor relationship. Tools that store tests as proprietary YAML or LLM-prompt blobs (testRigor, Momentic, Functionize) are not equivalently portable. Check vendor documentation before signing.

Question 19

What is visual regression testing?

Accepted Answer

A test category that captures a baseline image of a UI state and flags subsequent renders that differ. Modern tools use AI-tuned thresholds to suppress trivial differences. Applitools, Percy, Chromatic, and Meticulous occupy related sub-categories.

Question 20

How do I evaluate an AI testing tool before buying?

Accepted Answer

Run the tool against a representative sample of the team's actual codebase or application, not against vendor demo material. Measure mutation score where applicable, observe flake rate over a calendar week, and ask whether the test artefact remains portable if the vendor relationship ends. Vendor trial periods are the standard mechanism for this evaluation.

Question 21

Where can I read peer-reviewed research on AI testing?

Accepted Answer

The MuTAP paper (arXiv:2308.16557, published in Information and Software Technology, 2024) on LLM-augmented mutation testing is one of the most-cited evaluations of LLM-based test generation. SANER and ICSE conferences publish ongoing research; Stanford HELM aggregates LLM benchmarks including code scenarios.

Question 22

Does this site recommend specific tools?

Accepted Answer

No. The site is a vendor-neutral reference. Where readers need a recommendation, the methodology page explains why the site does not produce one and links to the published benchmarks (Diffblue 2025, MuTAP, SWE-Bench, HELM) that can support a defensible decision.

Common questions about AI testing tools.