$ testeragents
Category reference|Last verified April 2026

AI accessibility testing: axe, Evinced, and the WCAG ceiling.

Automated accessibility tools catch around 30 to 40 percent of WCAG Success Criteria. AI augmentation lifts this number modestly by handling some visual and contextual checks that pure rule-based scanners miss. The ceiling is structural: most WCAG criteria require human judgement and cannot be automated regardless of model sophistication. This page surveys the tools honestly, with reference to the published WCAG specification (w3.org/TR/WCAG22) and the practitioner literature.

The WCAG ceiling explained

WCAG 2.2 has 87 Success Criteria across three conformance levels (A, AA, AAA). Some criteria are deterministically machine-testable (image has alt attribute, contrast ratio meets threshold, page has a unique title). Others require human judgement (the alt text is meaningful, the heading hierarchy reflects the content structure, the keyboard navigation order matches the visual order in a way that makes sense).

The published industry estimate, going back to the original axe-core analyses and reinforced by Deque's ongoing research, is that around 30 to 40 percent of WCAG Success Criteria are machine-testable. AI augmentation lifts this by perhaps 5 to 10 percentage points by handling some visual and contextual cases that pure rule-based scanners cannot evaluate (is this alt text meaningful, does this colour contrast pass for users with specific colour-vision deficiencies, does this animation create accessibility risk). The ceiling is not 100 percent and will not be.

The base layer: axe-core and Microsoft Accessibility Insights

axe-core (github.com/dequelabs/axe-core) is the open-source accessibility engine maintained by Deque. It is embedded in browser extensions (axe DevTools), CI tools (axe-core CLI), and many other commercial accessibility tools. For most teams starting an accessibility programme, axe-core is the right starting point: free, well-maintained, embedded everywhere.

Microsoft Accessibility Insights (accessibilityinsights.io) wraps axe-core with a guided assessment workflow. The tool helps a tester walk through manual checks alongside the automated scan, producing a more complete WCAG evaluation. Free, open source, suitable for in-house programmes.

AI-augmented vendors

Evinced (evinced.com) extends rule-based scanning with AI-augmented detection of issues that pure rules miss: visual order versus DOM order conflicts, ambiguous interaction patterns, screen-reader narrative quality issues. The published positioning is that the AI layer catches 50 percent more issues than rule-based scanning alone; buyers should validate this claim on their own application.

Stark (getstark.co) is design-tool focused (Figma, Sketch) with AI-augmented accessibility checks at design time. Catching issues before code is written is structurally cheaper than catching them after; Stark's positioning is shift-left accessibility.

Siteimprove and Level Access are the enterprise platforms with AI augmentation layered onto longer-running rule-based scanning, audit-trail features, and conformance reporting suitable for legal and procurement contexts.

The right shape of an accessibility programme

Automated scanning catches one slice. Manual auditing catches another slice. User testing with disabled users catches a third slice. A real accessibility programme combines all three; relying on automation alone produces a programme that ships at 30 to 50 percent of the WCAG ceiling and tells the team it has "solved" accessibility.

Automated scanning in CI. Every PR runs axe-core (or equivalent) against the changed pages. Failures block merge. This catches regressions cheaply and continuously.

Manual auditing per release. Before each release, a trained accessibility tester walks through the changed flows. The tester uses tools like Accessibility Insights and screen readers (NVDA, JAWS, VoiceOver) to evaluate the criteria that automation cannot reach. This is the conformance signal.

User testing with disabled users. Periodically, recruit users with disabilities to walk through the product. This is the ground-truth signal that automation and audits cannot replace.

AI tools have a clear role in the first layer (faster, more accurate scanning) and a useful supporting role in the second layer (AI-suggested manual-check focus areas). They do not replace the third layer.

AI-generated alt text and content

AI-generated alt text is genuinely useful as a first draft. The failure mode is silent: technically correct descriptions of an image that miss the contextual meaning. A hero image with the alt text "a person smiling in front of a laptop" is technically correct but conveys nothing about why the image is on the page or what it should mean to a screen-reader user.

The right discipline: AI generates the first draft; a content author reviews and adjusts for contextual meaning; the published alt text passes both checks. Teams that auto-publish AI-generated alt text without review ship technically-correct alt text that is functionally useless.

Legal context

ADA Title III applies to public accommodations in the United States and accessibility is increasingly enforced. WCAG 2.1 AA is the de facto standard for compliance claims; 2.2 AA is the current published version. The European Accessibility Act applies in the EU. Conformance claims based on AI tool output alone are legally weaker than claims supported by manual audits and documented programmes; legal teams typically want both.

For teams in regulated contexts (government contracting, regulated industries), VPATs (Voluntary Product Accessibility Templates) are often required. VPATs are produced by humans evaluating against WCAG criteria; AI tools accelerate evidence collection but do not produce the VPAT.

Frequently asked questions

What proportion of WCAG issues can automated tools catch?
Industry estimates have long settled around 30 to 40 percent of WCAG Success Criteria being machine-testable. AI-augmented tools nudge this number upward by handling some cognitive and visual issues that pure rule-based scanners miss, but the underlying limit (many WCAG criteria require human judgement) remains.
Is axe-core still the standard?
Yes. axe-core (Deque) is the most widely embedded accessibility engine, used directly and through wrappers in many other tools. Microsoft Accessibility Insights uses axe under the hood. Most browser-extension accessibility tools use axe. The standard is durable.
Can AI write better alt text than a content author?
Sometimes, but the failure mode is silent. AI-generated alt text is often technically correct (describes the image) but misses the contextual meaning that a content author would convey. For a hero image, the alt text should convey what the image means in context, not what it depicts. AI is a useful first draft, not a replacement for human review on customer-facing content.
Do I still need manual accessibility audits?
Yes. Automated tools (AI-augmented or not) cannot evaluate cognitive load, keyboard navigation flow, screen-reader narrative quality, or many WCAG Success Criteria that require human judgement. Manual audits remain the canonical signal for AA or AAA conformance claims.
What about VPATs?
Voluntary Product Accessibility Templates (VPATs) are produced by humans evaluating against the conformance criteria. AI tools accelerate evidence collection but do not produce the VPAT. Conformance claims based on AI tool output alone are weaker than claims supported by manual audits.

Related on this site