The WebAIM Million
An accessibility analysis of the top 1,000,000 home pages
In February 2019, WebAIM conducted an accessibility evaluation of the home pages for the top 1,000,000 web sites using the WAVE stand-alone API (with additional tools to collect site technology parameters). While this research focuses only on automatically detectable issues, the results paint a rather dismal picture of the current state of web accessibility for individuals with disabilities.
A re-analysis of these million pages was conducted in August 2019 to identify changes over time.
The "top" million web sites were gleaned primarily using the Majestic Millions list of domains with most referring subnets. Because not all domains have home pages, the list of domains was supplemented with the top 250,000 domains from the Open PageRank Initiative that were not already in the Majestic Million list.
Home pages that returned errors (404, etc.) were not included. Pages with fewer than 10 HTML elements were also rejected—these tended to be placeholder or empty documents rather than home pages that convey content.
Home pages from 730 unique top-level domains were analyzed, with .com (521,316), .org (76,489), and .net (39,757) being the most common. 6,010 distinct .edu home pages were analyzed.
The WAVE accessibility engine was used to analyze the rendered home pages (i.e., the DOM of all pages after scripting and styles were applied). The WAVE engine uses heuristics and logic to detect patterns in web page content that align with end user accessibility issues and Web Content Accessibility Guidelines (WCAG) conformance failures. All automated tools, including WAVE, are limited in their detection of accessibility issues—only around 25% of possible conformance failures can be automatically detectable. Absence of detectable errors does not indicate that a site is accessible or compliant. Despite these limitations, the data presented in this report provide a meaningful representation of the state of web inaccessibility.
Why Only Home Pages?
We chose to focus only on home pages as a metric for web accessibility in general. Home pages are very often the most accessed pages on a web site and are the gateway to the rest of a web site's content. Home pages not only tend to receive the most attention from developers, but research indicates a correlation between issues detected on a home page and other site pages. Future research may explore additional pages beyond the home page.
Errors and Error Density
Errors are accessibility issues that are automatically detectable via WAVE, have notable end user impact, and are likely WCAG 2 conformance failures. 59,653,607 distinct accessibility errors were detected across the 1 million home pages—an average of 59.6 errors per page.
Error density (number of errors divided by number of page elements) for all home pages was collected. 782,481,056 distinct HTML elements were analyzed, meaning there was an average of 782 elements per home page. This results in approximately 7.6% of all home page elements having a detectable accessibility error. Users with disabilities would expect to encounter detectable errors on 1 in every 13 elements with which they engage.
Error density is an interesting metric and is provided in the site lookup. However, a significant increase in page elements (
<span>s, for example) may result in a lower error density (suggesting better accessibility), when in fact many new accessibility errors may have also been introduced. We have thus chosen to focus in this report on average number of detectable errors (end user barriers) present as opposed to error densities (how diluted those errors are within page elements).
There was no significant change in error counts or error density based on popularity rank. The home pages for the most popular domains had only slightly more errors and more elements than home pages for the least popular sites in the sample.
97.8% of home pages had detectable WCAG 2 failures! These are only automatically detectable errors that align with WCAG conformance failures with a high level of reliability. Because automatically detectable errors constitute a small portion of all possible WCAG failures, this means that the actual WCAG 2 A/AA conformance level for the home pages for the most commonly accessed web sites is very low, perhaps below 1%.
|WCAG Failure Type||# of home pages||% of home pages|
|Low contrast text||852,868||85.3%|
|Missing alternative text for images||679,964||68%|
|Missing form input labels||528,482||52.8%|
|Missing document language||329,612||33.1%|
While failures are prevalent, the types of common errors are relatively few. Simply addressing these few types of issues would have a significant positive impact on web accessibility.
Low contrast text, below the WCAG 2 AA thresholds, was the most common accessibility issue detected. The vast majority (85.3%) of home pages analyzed had detectable WCAG contrast failures. Contrast errors were only detected on elements that contain text. On average, home pages had 36 distinct instances of text with insufficient contrast. 4.6% of all home page HTML elements (this is all elements, not just visible elements with text) analyzed had insufficient contrast.
Images and Alternative Text
There were 36,713,043 images in the sample, or 36.7 images per home page on average. 33.6% of all images (12.3 per page on average) had missing alternative text (not counting
alt=""). 18.5% of all images (6.7 per page on average) were linked images with missing or empty alternative text, resulting in both an alternative text issue and a link lacking any description. 16% of pages had images and no
alt attributes at all.
16.8% of images that were assigned alternative text had questionable (such as alt="image", "graphic", "blank", a file name, etc.) or repetitive alternative text (alternative text identical to adjacent text or an adjacent image's alternative text).
If we assume that this million page sample is indicative of accessibility of broader web pages, these data indicate that around half of images encountered by users with disabilities would definitively have inappropriate alternative text. This, however, presumes that all other images were actually given equivalent alternative text, which is certainly not the case. As an example, 4.5 million non-linked images (12.2% of all images) had been given
alt=""—it's likely that many of these images should have been assigned alternative text.
2,218 pages (.2% of the sample) had a
longdesc attribute present. However, 49.7% of the 12,051
longdesc attributes encountered had invalid values, such as an empty value, an invalid URL, an image file name, etc.
59% of the 3.4 million form inputs identified were unlabeled (either via
aria-labelledby). The presence of unlabeled form controls was a strong indicator of broader errors—pages with at least one missing form label averaged nearly 30 more errors than those without any label errors.
There were 18,910,980 headings detected. These break down to 1.7 million
<h1>s (9.1%), 5.9 million
<h2>s (31.4%), 6.5 million
<h3>s (34.5%), 3.2 million
<h4>s (16.7%), 1.1 million
<h5>s (5.7%), and .5 million
There were 908,784 instances of skipped heading levels (e.g., jumping from
<h4>)—one in every 20 headings was improperly structured. Skipped headings were present on 362,659 home pages (36.3% of all pages). 148,573 home pages (14.9%) had no headings present at all.
62.4% of home pages had at least one region defined. This includes pages with ARIA landmarks (e.g., a navigation region defined with the HTML
<nav> element and/or ARIA
role="navigation"). Pages with
<nav> (51.0%), and
<header> (50.4%) were most common. 23.5% of home pages had a main element or landmark present, 19.1% had an aside/complementary region present, and 15.9% a search landmark.
Pages with at least one region averaged 7.6 regions. Pages with a
<main> region defined, however, averaged notably more - 10.5 regions per page.
96.9% of pages with
<main> have only one instance of
<main>. Pages with a
<nav> element present averaged 2.1 of them per page, and pages with a
<header> averaged 3.2 of them per page.
The presence of a
<main> element was an indicator of better accessibility—those pages averaged 3 fewer errors than pages lacking a main region.
60.1% of the 1 million home pages had ARIA present. 22.3 million page elements with ARIA attributes were detected. The number of ARIA attributes outpaced both the number of images present and the number of headings present. Home pages that included ARIA had an average of 38.3 ARIA attributes each. 19% of the ARIA attributes were
aria-describedby. NOTE: These figures do not include ARIA landmark roles.
Home pages with ARIA present averaged 26.7 more detectable errors than pages without ARIA! An increase in the number of ARIA attributes also had a moderate correlation with increased errors. In other words, the more ARIA in use, the higher the detectable errors. This does not necessarily mean that ARIA introduced these errors (it's likely these pages are simply more complex), but pages typically have more errors when ARIA is present, and even more so with higher ARIA usage.
9.6% of home pages had a "skip" link present. However, 14.3% of these pages had skip links that were broken—either they were hidden in a way that made them inaccessible or the target for the skip link was not present in the page.
Pages with at least one non-broken "skip" link present averaged 10.4 fewer errors than those without a "skip" link. This was one of the strongest indicators of better accessibility.
74.1% of home pages had a valid HTML5 doctype. Pages with a valid HTML5 doctype had significantly more page elements (average of 844 vs. 605) and errors (average of 61.9 vs. 53.3) than those with other doctypes. 1,130 unique doctypes (most of these, obviously, being invalid) were encountered in the sample.
Pages from various top-level domains (TLDs) were analyzed for accessibility differences. Pages with .com (n=521,316) or .net (n=39,757) had just a few more errors on average than pages from other domains. Pages with .org (n=76,489), on the other hand, faired significantly better (47.4 errors on average) than those from other domains (60.6 errors).
Pages from the following highly common top-level domains (ordered by number of home pages in that TLD) had notably fewer errors than their counterparts:
- .de (Germany)
- .uk (United Kingdom)
- .jp (Japan)
- .nl (Netherlands)
- .edu (U.S.-based education institutions)
- .au (Australia)
- .ca (Canada)
Pages from the following highly common top-level domains had notably more errors than their counterparts:
- .ru (Russia)
- .cn (China)
- .pl (Poland)
- .br (Brazil)
- .it (Italy)
- .es (Spain)
Home pages with .edu (37.1 errors), .us (36.6 errors), and .gov (30.5 errors), which are all affiliated with U.S.-based entities, had the lowest number of average accessibility errors of all common (n>2000) domains.
Data regarding 1,195 different types of technologies used on the one million home pages were collected and analyzed. Technologies that were detected on more than 5,000 home pages (.5% of the sample) are listed below. The categorized tables below show the technology name, the number of home pages with that technology present, the average number of errors present on those pages, and the percent difference in number of average errors detected on pages with that technology present vs. those without. Technologies are ordered from "best" to "worst".
As an example, the first table indicates that home pages on the Squarespace CMS had 45.4% fewer errors (almost half as many) as pages that didn't utilize that technology, pages with WordPress exhibited little difference in accessibility errors, and pages on Blogger had 237% more errors (over 3 times as many) than other pages. It is important to note that correspondence of additional errors with a technology cannot automatically be attributed to that technology.
Content Management Systems
|CMS||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
There is a wide diversity in the impact that the CMS choice appears to have on accessibility.
|Framework||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
With the exception of MooTools and TweenMax, the adoption of any of these frameworks is aligned with additional accessibility errors. This does not necessarily mean that the frameworks caused these errors, but does indicate that home pages with these frameworks have more errors than pages without.
|Library||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
|jQuery Migrate||313,391 (31.3%)||61.7||5.1%|
The vast majority of the top one million home pages utilize jQuery. Home pages with jQuery averaged 19.2 more errors than those without jQuery. The presence of jQuery corresponds with nearly 15 million detected errors, or over 25% of all of the accessibility errors we detected. Pages with jQuery were a bit more likely to have alternative text and contrast errors, but much more likely to have empty buttons (2.4 times as many), missing form labels (almost 3 times as many), and empty links (3.4 times as many) than non-jQuery pages. Interestingly, pages with jQuery were twice as likely to have the document language identified than pages without. Pages with jQuery were much more complex (844 elements on average) than other pages (605 elements on average).
|Web Framework||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
|ZURB Foundation||25,390 (2.5%)||62.3||4.5%|
Home pages in the sample that utilize the popular Bootstrap framework had 1.3 million more accessibility errors than pages that did not utilize Bootstrap. We can't know from these data if Bootstrap introduced these errors, but there is a strong correspondence of increased errors when Bootstrap is present.
|Ad Network||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
|Google AdSense||125,462 (12.5%)||100.9||87.8%|
Pages that utilized any of these popular ad systems had more errors on average than those that did not. Home pages that utilize the very common Google AdSense system had 47.2 more errors on average, nearly double, than other pages!
Other common technologies also resulted in pages having more errors. Pages with ReCaptcha had 14.9 more errors on average than those without. Pages with Google Maps averaged 13.9 more errors, those with PHP averaged 7.6 more errors, and those with Java averaged 4.7 more errors.
Here are several other fun facts regarding this research:
- The WebAIM Million database has 168,000,000 data points.
- It took 66.2 days of cumulative computer processing time to download and process all 1,000,000 home pages in the sample. This was shared among 5 AWS instances that ran continuously for 5 days.
- Despite being 2019, 11,200 home pages had
<marquee>and 570 home pages had blinking content (
- 2,099,665 layout tables were detected compared to only 113,737 data tables.
- The most errors detected on a single home page was 26,680!
These data show that there is still significant work to be done to ensure the web is made accessible to everyone. It is hopeful that this research will promote greater interest and effort to this end. While the volume of errors is disconcerting, most of the significant errors are of just a few types. We will publish additional analyses of this data and will conduct similar, more extensive research in the future.
There are countless ways in which this data can be examined and explored. This report really only scratches the surface. If you have questions about this research or would like us to analyze the database for something specific, please contact us.