The WebAIM Million
An annual accessibility analysis of the top 1,000,000 home pages
In February 2019 and February 2020, WebAIM conducted an accessibility evaluation of the home pages for the top 1,000,000 web sites and over 100,000 additional interior site pages. The evaluation was conducted using the WAVE stand-alone API (with additional tools to collect site technology parameters). While this research focuses only on automatically detectable issues, the results paint a rather dismal picture of the current state of web accessibility for individuals with disabilities.
Results below are the most recent available—currently from February 2020—with comparisons to, and trends from, earlier analyses.
The million home page list was derived from various sources, primarily the Majestic Millions list. Due to different methods of determining "top" sites, this list was supplemented with additional top pages from the Open PageRank Initiative, and Alexa Top Sites. Sites without home pages or pages that returned errors (404, etc.) were not included.
In addition to home pages, 122,768 randomly-selected pages from within the top 1300 web sites (up to 100 each) were analyzed to determine differences and correlations between accessibility of home pages and other site pages.
The WAVE accessibility engine was used to analyze the rendered DOM of all pages after scripting and styles were applied). WAVE uses heuristics and logic to detect end-user accessibility barriers and Web Content Accessibility Guidelines (WCAG) conformance failures. All automated tools, including WAVE, have limitation—only 25% to 35% of possible conformance failures can be automatically detected. Absence of detectable errors does not indicate that a site is accessible or compliant. Still, the data presented in this report provide a meaningful representation of the state of web inaccessibility.
Errors, Page Complexity, and Error Density
"Errors" are WAVE-detected accessibility barriers having notable end user impact, and which are likely WCAG 2 Level A/AA conformance failures. Across the one million home pages, 60,909,278 distinct accessibility errors were detected—an average of 60.9 errors per page. The number of errors increased 2.1% between February 2019 and February 2020.
Home page complexity increased 10.4% in 12 months, from an average of 782 elements per page to 864. This indicates that 7% of all home page elements had a detectable accessibility error. Users with disabilities would expect to encounter detectable errors on 1 in every 14 home page elements with which they engage.
Interior pages had 53 errors on average and 1040 elements on average—13% fewer errors, but 20% more page elements than home pages. One in every 19.6 elements of interior pages would be expected to introduce an accessibility issue.
Error density (number of errors divided by number of page elements) is provided in the site lookup, but is an unreliable metric of site accessibility. A significant number of page elements (
<span> elements, for example) may result in a lower error density (suggesting better accessibility), when in fact many new accessibility errors may have also been introduced. This report focuses on average number of detectable errors (end user barriers) present as opposed to error densities (how diluted those errors are within page elements).
98.1% of home pages had detectable WCAG 2 failures! This was up from 97.8% in February 2019. These are only automatically detectable errors that align with WCAG conformance failures with a high level of reliability. Because automatically detectable errors constitute a small portion of all possible WCAG failures, this means that the actual WCAG 2 A/AA conformance level was probably much lower.
|WCAG Failure Type||% of home pages in February 2020||% of home pages in February 2019|
|Low contrast text||86.3%||85.3%|
|Missing alternative text for images||66.0%||68.0%|
|Missing form input labels||53.8%||52.8%|
|Missing document language||28.0%||33.1%|
The vast majority of barriers recorded stem from these six categories. Addressing just these few concepts would significantly improve accessibility across the web.
Of the 122,768 random pages from within the top 1300 sites, 97.8% had WCAG failures. Home pages and interior pages scored about the same.
- 85.4% of pages had low contrast text.
- 61.9% had missing alternative text
- 63.4% had empty links.
- 56.1% had missing form input labels.
- 36.7% had empty buttons.
- 26.3% were missing document language.
The prevalence of empty links, missing labels, and empty buttons was slightly higher on interior pages than on home pages, whereas contrast and alternative text errors were less common on interior pages.
Low contrast text, below the WCAG 2 AA thresholds, was found on 86.3% of home pages. This was the most commonly-detected accessibility issue. On average, home pages had 36 distinct instances of low-contrast text. Interior pages scored slightly better, at 30.3 instances per page.
Images and Alternative Text
There were 38,426,701 images in the sample, or 38.4 images per home page on average. The number of images increased 4.7% since 2019. 31.3% of all home page images (12 per page on average) had missing alternative text (not counting
alt=""). Over half of the images missing alternative text (16.8% of all images, or 6.5 per page on average) were linked images.
9.3% of images that were assigned alternative text had questionable or repetitive alternative text—such as alt="image", "graphic", "blank", a file name, etc., or alternative text identical to adjacent text oralternative text.
Interior pages had fewer images (24.7) on average and slightly fewer images that were missing alternative text (29.6%).
These data show that one can expect nearly half of the images on the web to have missing, questionable, or repetitive alternative text.
Despite being removed from HTML5 nearly a decade ago, 1,885 home pages had a
longdesc attribute present, a decrease from 2,218 in 2019. 44% of the 21,504
longdesc attributes encountered had invalid values, such as an empty value, an invalid URL, an image file name, etc.
56% of the 3.4 million form inputs identified were unlabeled (either via
aria-labelledby), down from 59% in 2019. Pages with at least one unlabeled form control averaged 43 more detectable errors than pages without any label errors.
Nearly 21 million headings were detected (21 on average per page), an increase of 10.7% over one year. The number (and prevalence) of heading levels were:
- 1.7 million
- 6.5 million
- 7.3 million
- 3.5 million
- 1.3 million
20.5% of home pages had more than one
<h1>. There were 1,002,946 instances of skipped heading levels (e.g., jumping from
<h4>)—1 in every 20 headings was improperly structured. Skipped headings were present on 39.1% of all pages, and 12.4% had no headings present at all.
Interior pages averaged 15.5 headings per page. The prevalence of multiple
<h1> headings and skipped heading levels was similar to home pages, though only 6.1% of interior pages had no headings at all.
68.9% of home pages had at least one region (or ARIA landmark) defined, a notable increase from 62.4% in 2019. 74% of interior pages had at least one region. 27.8% of home pages had a <main> element or main landmark, up from 23.5% in 2019. 17.5% of home pages had a "search" landmark, up from 15.9% in 2019.
97% of pages with a main region/landmark had only one instance (which is a correct implementation). Pages with a navigation region/landmark present averaged 2.2 of them per page.
Pages with regions/landmarks averaged 14 fewer errors than pages without them.
64.6% of the one million home pages used ARIA (excluding ARIA landmark roles) (an increase from 60.1% in 2019). 74.7% of interior pages used ARIA (excluding landmark roles).
Home pages with ARIA present averaged 60% more errors than those without! One would expect to encounter an additional 26.2 potential barriers on home pages with ARIA. Interestingly, this same disparity did not exist on interior pages, where ARIA did not correlate to errors, even though interior pages had 71% more ARIA markup than home pages.
Home pages averaged 5.6
aria-labelledby attributes and 3.7
aria-describedby attributes per page. 8.9% of these attributes had broken references (meaning the element referenced by the ARIA attribute did not exist on the page).
73,049 home pages (1 in 14) had an ARIA menu (
role="menu"), but an alarming 53.7% of those introduced accessibility barriers due to the lack of necessary ARIA menu markup and interactions.
Home pages also averaged 6.6
aria-describedby attributes, and 1.3
role="button" attributes per page. Over 10 million (10 per page) instances of
tabindex=1 were present.
ARIA correlated to higher detectable errors. This does not necessarily mean that ARIA introduced these errors (these pages are more complex), but pages typically had more errors when ARIA was present. This correlation increases with increased ARIA usage - this is worsening over time.
10.8% of home pages had a "skip" link present. However, 11.1% of these links were broken—either they were hidden in a way that made them inaccessible or the link target was not present in the page.
80% of home pages had a valid HTML5 doctype—a notable increase from 74.1% in 2019. Pages with a valid HTML5 doctype had 46% more page elements (average of 922 vs. 630) and 21% more errors (average of 63 vs. 52.2) than those with other doctypes. 922 unique doctypes (most of these, obviously, being invalid) were encountered in the million page sample.
We're grateful for the support of webshrinker.com in providing us the site category data. Their support makes this valuable information available.
The home pages were categorized based on content into IAB Content Taxonomy categories. The table below shows the number of home pages in each category (some sites may be in more than one category), average number of errors in that category, and the percent difference in errors for that category and the average of 60.9 errors for the entire million page sample. The categories are ordered from least errors to most errors.
|Category||# of home pages||# of errors|
|Law, Government, and Politics||13,675||51.1 (−16.1%)|
|Technology and Computing||59,883||53.4 (−12.3%)|
|Health & Fitness||22,071||55.9 (−8.2%)|
|Personal Finance||14,256||57.7 (−5.2%)|
|Home and Garden||11,692||62.9 (3.3%)|
|Family and Parenting||2,261||63.9 (5.0%)|
|Food and Drink||12,624||66.1 (8.6%)|
|Religion and Spirituality||6,124||67.1 (10.1%)|
|Hobbies and Interests||34,429||73.3 (20.4%)|
|Arts and Entertainment||13,852||73.6 (20.9%)|
|Style and Fashion||8,912||77.3 (26.8%)|
|Real Estate||6,418||77.3 (26.9%)|
|Adult Content||8,811||94.8 (55.7%)|
There were notable differences in accessibility errors for sites in different categories. News/Weather/Information sites had over twice as many errors as Law, Government, and Politics sites.
770 unique top-level domains (.com, .tv, .fashion, etc.) were represented in the million pages analyzed. Home pages with .com (530,215), .org (72,040), and .net (39,635) were the most common. 5,903 distinct .edu home pages were analyzed.
The table below shows the most common TLDs with page counts, average number of errors, and difference between those errors and the average of 60.9 errors for the entire million page sample.
|Top-level Domain||# of home pages||# of errors|
This shows notable differences between TLDs. Home pages with .gov (27.1 errors), .edu (34.9 errors), and .us (42.5 errors), which are all affiliated with U.S.-based entities, had the lowest number of average accessibility errors of all common (n>2000) TLDs.
71.4% of pages specified a valid document language. This table shows the specified page language for the most common languages, number of pages in the sample, average number of errors, and difference in errors from the overall average.
|Specified Language||# of home pages||# of errors|
As with TLDs, there are significant differences in accessibility of pages in various languages, with Farsi, Chinese, Russian, Portuguese, and Italian pages being much worse than the average.
Data regarding over 1000 different types of technologies present on the one million home pages were collected and analyzed. Technologies detected on more than 5,000 home pages (.5% of the sample) are listed below, and are ordered from "best" to "worst". Note that correspondence of additional errors with a technology cannot automatically be attributed to that technology.
Content Management Systems
|CMS||# (and %) of home pages||Avg. # of errors||% difference|
There was a wide diversity in the impact that the CMS choice appeared to have on accessibility.
|Framework||# (and %) of home pages||Avg. # of errors||% difference|
Except for MooTools, TweenMax, and Knockout.js, the adoption of any of these frameworks corresponded with more accessibility errors than the average home page. This does not necessarily mean that the frameworks caused these errors, but it does indicate that home pages with these frameworks had more errors than on average. Note: Angular is not listed because it was not present on at least .5% of home pages in 2020, but home pages with Angular averaged 53.7 errors (11.8% lower than the average).
|Library||# (and %) of home pages||Avg. # of errors||% difference|
|jQuery Migrate||321,495 (32.1%)||63.4||+4.1%|
The vast majority of the top one million home pages utilize jQuery. Home pages with jQuery averaged 20.3 more errors than those without. Pages with jQuery were much more complex (936 elements on average) than other pages (597 elements on average).
|Web Framework||# (and %) of home pages||Avg. # of errors||% difference|
|ZURB Foundation||26,243 (2.6%)||61.9||+1.6%|
Home pages in the sample that utilize the popular Bootstrap framework had 7 more accessibility errors on average than those that did not. We can't know from these data if Bootstrap introduced these errors, but there was a correspondence of increased errors when Bootstrap was present.
|Ad Network||# (and %) of home pages||Avg. # of errors||% difference|
|Google AdSense||136,482 (13.6%)||97.9||+60.8%|
Pages that utilized any of these popular ad systems had more errors on average than those that did not. The data suggest that ads were among the strongest harbingers of accessibility errors. Home pages that utilize the very common Google AdSense system had 37 more errors on average—nearly double—than other pages!
Other common technologies also correlated to more errors. Pages with ReCAPTCHA had 12.6 more errors on average than those without. Pages with Google Maps averaged 11.2 more errors, and those with PHP averaged 8.1 more errors.
Here are several other fun facts regarding this research:
- With 3 analyses of over one million pages each, the WebAIM Million database is approaching one billion data points.
- It took 49 days of cumulative computer processing time to download and process all 1,122,768 pages in the sample. This was shared among several AWS instances that ran continuously for 5 days.
- Despite being 2020, 9443 home pages had
<marquee>and 542 home pages had blinking content (
- 1,876,456 tables were observed, down 15% from 2,213,402 in 2019. Only 128,054 (6.8%) of the tables had valid data table markup.
- The most errors detected on a single home page was 24,444!
Significant work remains to be done to make the web accessible to everyone. Unfortunately, however, the rate of WCAG non-conformance and the number of errors present are slowly increasing over time. WebAIM hopes that this report will help influence improved accessibility.
The 2019 WebAIM Million report is available for historical purposes..