The WebAIM Million
An annual accessibility analysis of the top 1,000,000 home pages
In February of 2019, 2020, and 2021 WebAIM conducted an accessibility evaluation of the home pages for the top 1,000,000 web sites. The evaluation was conducted using the WAVE stand-alone API (with additional tools to collect site technology and sector parameters). The results provide an overview of and insight into the current state of web accessibility for individuals with disabilities and trends over time.
Results below are the most recent available—currently from February 2021—with comparisons to, and trends from, earlier analyses.
The million home page list was derived from various sources, primarily the Majestic Millions list. Due to different methods of determining "top" sites, this list was supplemented with additional top pages from the Open PageRank Initiative and Alexa Top Sites. Sites without home pages, pages that returned errors (404, etc.), or pages with fewer than 10 HTML elements were not included.
The WAVE accessibility engine was used to analyze the rendered DOM of all pages after scripting and styles were applied. WAVE uses heuristics and logic to detect end-user accessibility barriers and Web Content Accessibility Guidelines (WCAG) conformance failures. All automated tools, including WAVE, have limitations—not all conformance failures can be automatically detected. Absence of detectable errors does not indicate that a site is accessible or compliant. Still, the data presented in this report provide a meaningful representation of the state of web inaccessibility.
Errors, Page Complexity, and Error Density
Across the one million home pages, 51,379,694 distinct accessibility errors were detected—an average of 51.4 errors per page. The number of errors decreased by 15.6% between February 2020 (60.9 errors) and February 2021 (51.4 errors)! "Errors" are WAVE-detected accessibility barriers having notable end user impact, and which have a very high likelihood of being WCAG 2 Level A/AA conformance failures.
Home page complexity increased slightly in 12 months, from an average of 864 elements per page to 887 elements. 5.8% of all home page elements had a detectable accessibility error. Users with disabilities would expect to encounter detectable errors on 1 in every 17 home page elements with which they engage.
Error density (number of errors divided by number of page elements) is provided in the site lookup, but is an unreliable metric of site accessibility. A significant number of page elements (
<span> elements, for example) may result in a lower error density (suggesting better accessibility), when in fact many new accessibility errors may have also been introduced. This report focuses on average number of detectable errors—likely end user barriers—present as opposed to error densities (how diluted those errors are within page elements).
97.4% of home pages had detectable WCAG 2 failures! This was down slightly from 98.1% in February 2020. These are only automatically detectable errors that align with WCAG conformance failures with a high level of reliability. Because automatically detectable errors constitute a small portion of all possible WCAG failures, this means that the actual WCAG 2 A/AA conformance level was certainly much lower.
|WCAG Failure Type||% of home pages in February 2021||% of home pages in February 2020||% of home pages in February 2019|
|Low contrast text||86.4%||86.3%||85.3%|
|Missing alternative text for images||60.6%||66.0%||68.0%|
|Missing form input labels||54.4%||53.8%||52.8%|
|Missing document language||28.9%||28.0%||33.1%|
The vast majority of barriers recorded stem from these six categories. Addressing just these few types of issues would significantly improve accessibility across the web.
Over the course of three years, home pages with low contrast text, missing input labels, and empty buttons have generally increased, whereas home pages with missing alternative text, empty links, and missing document language have generally decreased.
Low Contrast Text
Low contrast text, below the WCAG 2 AA thresholds, was found on 86.4% of home pages. This was the most commonly-detected accessibility issue. On average, home pages had 31 distinct instances of low-contrast text.
Images and Alternative Text
There were 37,948,510 images in the sample, or 37.9 images per home page on average. The number of images decreased slightly since 2020. 26% of all home page images (10 per page on average) had missing alternative text (not counting
alt=""). Nearly half of the images missing alternative text were linked images—resulting in links that were not descriptive.
9.6% of images that were assigned alternative text had questionable or repetitive alternative text—such as alt="image", "graphic", "blank", a file name, etc., or alternative text identical to adjacent text or alternative text.
These data show that one can expect over one third of the images on the web to have missing, questionable, or repetitive alternative text. This number, however, is decreasing as the prevalence of alternative text is slowly increasing over time.
Despite being removed from HTML5 over a decade ago, 2,009 home pages had a
longdesc attribute present, an increase from 1,885 in 2020. Half of the 16,778
longdesc attributes encountered had invalid values, such as an empty value, an invalid URL, an image file name, etc.
45% of the 4.4 million form inputs identified were not properly labeled (either via
aria-labelledby). This is a significant improvement from 55% in 2020 and 59% in 2019. Despite this improvement, nearly half of form inputs do not have associated label texts.
Nearly 21.5 million headings were detected (over 21 on average per page). The number (and prevalence) of heading levels were:
- 1.6 million
- 6.5 million
- 7.5 million
- 3.6 million
- 1.5 million
- .7 million
18.4% of home pages had more than one
<h1>—a decrease from 20.5% in 2020. The prevalence of
<h6> headings all increased.
There were 1,017,026 instances of skipped heading levels (e.g., jumping from
<h4>) and 1 in every 21 headings was improperly structured. Skipped headings were present on 38.4% of all pages (down from 39.1% in 2020), and 10.6% of pages had no headings present at all (down from 12.4% in 2020).
These data all suggest that headings are being used more frequently and more appropriately over time. This is important because headings are the primary mechanism used by screen reader users to navigate content.
69.1% of home pages had at least one region (or ARIA landmark) defined—an increase from 68.9% in 2020 and 62.4% in 2019. A
<main> element or main landmark was present on 30.1% of home pages, up from 27.8% in 2020 and 23.5% in 2019. 17.6% of home pages had a "search" landmark.
96% of home pages with a main region/landmark had only one instance (which is a correct implementation). Pages with a navigation region/landmark present averaged 2.3 of them per page.
47,883,732 ARIA attributes were detected—nearly 48 per page on average! ARIA code usage increased 25% in just one year.
68.1% of the one million home pages used ARIA (excluding ARIA landmark roles)—an increase from 64.6% in 2020 and 60.1% in 2019. Home pages with ARIA present averaged 41% more detectable errors than those without ARIA, meaning one would expect to encounter an additional 24 potential barriers on home pages with ARIA present. This number, however, decreased from 60% in 2020.
ARIA correlated to higher detectable errors. The more ARIA attributes that were present, the more detectable accessibility errors could be expected. This does not necessarily mean that ARIA introduced these errors (these pages are more complex), but pages typically had more errors when ARIA was present.
Home pages averaged 7.6
aria-labelledby attributes (an increase from 5.6 in 2020). 1 out of every 15
aria-describedby attributes had broken references (meaning the element referenced by the ARIA attribute did not exist on the page).
7% of home pages (1 in 14) had an ARIA menu (
role="menu"), but an alarming 60.1% of ARIA menus (an increase from 53.7% in 2020) introduced accessibility barriers due to the lack of necessary ARIA menu markup and interactions.
Home pages also averaged 8.9
aria-hidden="true" attributes (up from 6.6 in 2020) and 2.2
role="button" attributes (up from 1.3) per page. Over 14.3 million (14.3 per page on average) instances of
tabindex=1 were present (up 43% from 2020).
10.3% of home pages had a "skip" link present (a slight decrease from 10.8% in 2020). However, 11.1% of "skip" links were broken—either they were hidden in a way that made them inaccessible or the link target was not present in the page.
79.1% of home pages had a valid HTML5 doctype—a notable increase from 74.1% in 2019, but a small decrease from 80% in 2020. Pages with a valid HTML5 doctype had nearly double the page elements (average of 987 vs. 508) and 35% more errors (average of 54.3 vs. 40.2) than those with other doctypes. 707 unique doctypes (most of these, obviously, being invalid) were encountered in the million-page sample.
We're grateful for the support of webshrinker.com in providing us the site category data. Their support makes this valuable information available.
The home pages were categorized based on content into IAB Content Taxonomy categories. The table below shows the number of home pages in each category (some sites may be in more than one category), average number of errors in that category, and the percent difference in errors for that category and the average of 51.4 errors for the entire million-page sample. In other words, the percentage difference is how much better or worse that category is than the average home page.
|Category||# of home pages||# of errors|
|Food and Drink||30,722||46.8 (−8.8%)|
|Law, Government, and Politics||23,451||46.9 (−8.7%)|
|Technology and Computing||154,378||47.4 (−7.8%)|
|Health & Fitness||53,886||48.5 (−5.6%)|
|Personal Finance||41,740||50.6 (−1.5%)|
|Social Media & Society||22,875||50.7 (−1.3%)|
|Religion and Spirituality||9,766||51.4 (0.1%)|
|Home and Garden||31,705||51.5 (0.2%)|
|Family and Parenting||6,460||53.0 (3.1%)|
|Arts and Entertainment||40,505||57.3 (11.4%)|
|Hobbies and Interests||84,718||61.7 (20.1%)|
|Style and Fashion||20,086||63.2 (23.1%)|
|Real Estate||17,577||71.4 (38.9%)|
|Adult Content||19,079||83.1 (61.7%)|
There were notable differences in accessibility errors for sites in different categories. Home pages in the Food & Drink category were most improved since 2020 with errors reduced from 66.1 to 46.8 errors on average. This improvement may be at least partially attributed to the significant increase in litigation regarding web accessibility in this sector. Shopping sites, which were also highly subject to accessibility complaints and lawsuits, were greatly improved from 90.5 errors to 75.2 errors on average, yet this category remains among the least accessible.
793 unique top-level domains (.com, .tv, .fashion, etc.) were represented in the million pages analyzed. Home pages with .com (501,741), .org (73,959), and .net (44,813) were the most common.
The table below shows the most common TLDs with page counts, average number of errors, and difference between those errors and the average of 51.4 errors for the entire million-page sample.
|Top-level Domain||# of home pages||# of errors|
This shows notable differences between TLDs. Home pages with .gov (22.5 errors), .edu (30.0 errors), and .us (35.7 errors), which are all affiliated with U.S.-based entities, had among the lowest number of average accessibility errors of all common (n>2000) TLDs.
72% of pages specified a document language. Pages without a language defined had significantly fewer errors on average than pages with a language defined.
This table shows the specified page language for the most common languages, number of pages in the sample, average number of errors, and difference in errors from the overall average.
|Specified Language||# of home pages||# of errors|
|No language specified||280,097||44.5 (−13.4%)|
As with TLDs, there are significant differences in accessibility of pages in various languages, with Russian, Chinese, and Spanish pages being much worse than the average. Russian and Chinese pages had nearly twice as many detectable errors than pages in English.
Data regarding over 1000 different types of technologies present on the one million home pages were collected and analyzed. Technologies detected on more than 5,000 home pages are listed below, ordered from "best" to "worst". Note that correspondence of additional errors with a technology cannot automatically be attributed to that technology.
Content Management Systems
|CMS||# (and %) of home pages||Avg. # of errors||% difference|
There was a wide diversity in the impact that the CMS choice appeared to have on accessibility.
|Framework||# (and %) of home pages||Avg. # of errors||% difference|
Except for React (which saw notable decreases in detectable errors over the last year), the adoption of any of these frameworks corresponded with more accessibility errors than the average home page. This does not necessarily mean that the frameworks caused these errors, but home pages with these frameworks had more errors than on average.
|Library||# (and %) of home pages||Avg. # of errors||% difference|
|jQuery Migrate||203,527 (20.4%)||58.5||+13.9%|
|Web Framework||# (and %) of home pages||Avg. # of errors||% difference|
|Ruby on Rails||5,144 (0.5%)||40.7||−20.7%|
|ZURB Foundation||16,654 (1.7%)||57.4||+11.8%|
Home pages in the sample that utilize the popular Bootstrap framework had 8.5 more accessibility errors on average than those that did not. We can't know from these data if Bootstrap introduced these errors, but there was a correspondence of increased errors when Bootstrap was present.
|Platform||# (and %) of home pages||Avg. # of errors||% difference|
The most popular ecommerce systems all corresponded with increased accessibility errors.
|Ad Network||# (and %) of home pages||Avg. # of errors||% difference|
|Google AdSense||85,505 (8.6%)||78.7||+53.2%|
Pages that utilized any of these popular ad systems had more errors on average than those that did not. The data suggest that ads were among the strongest harbingers of accessibility errors. Home pages that utilize the common Google AdSense system had 27 more errors on average than other pages.
Other common technologies also correlated to more errors. 7.9% of pages had ReCAPTCHA and averaged 10.8 more errors than those without. 39.5% of pages had Google Fonts and averaged 6.1 more errors, 3.4% of pages had Google Maps and averaged 11.6 more errors, and 47% of pages had PHP and averaged 5.9 more errors.
Here are some fun facts regarding this research:
- With 4 analyses of over one million pages each, the WebAIM Million database is approaching two billion data points.
- Despite being 2021, 14,501 home pages had
<marquee>and 341 home pages had blinking content (
- 1,533,402 tables were observed, down from 1,876,456 in 2020. Only 140,793 (9.2%) of the tables had valid data table markup.
- The most errors detected on a single home page was 25,361!
While 2021 saw small decreases in the number of detectable accessibility errors and WCAG conformance failures, significant work remains to be done to make the web accessible to everyone. WebAIM hopes that this report will help influence improved accessibility.