Designing for Screen Reader Compatibility

Overview

Screen readers are audio interfaces. Rather than displaying web content visually for users in a "window" or screen on the monitor, screen readers convert text into synthesized speech so that users can listen to the content. Sighted users usually have a hard time imagining having to always rely on an audio interface because their world is so highly visual. The experience is completely different, to be sure. The miracle is that the option for an audio interface even exists at all. Without screen readers, people who are blind would need to rely on other individuals to read the content out loud to them. The technology makes independent access to information possible for a population that would otherwise always need the support and assistance of others.

Screen readers do not read web content quite like human beings do. The voice may sound somewhat robotic and monotone. In addition, experienced users often like to speed up the reading rate to 300 words per minute or more, which is more than the inexperienced listener can easily understand. In fact, when many people hear a screen reader for the first time, at the normal rate of about 180 words per minute, they complain that it reads too quickly. It takes time to get used to a screen reader, but the interesting thing is that once users get used to it, they can race through content at speeds that can amaze sighted individuals.

Two of the most common screen readers are JAWS, by Freedom Scientific, and Window Eyes, by GW Micro. These programs can read not only web content but also the Windows operating system, word processing programs, and other software. Mac and iOS devices come with VoiceOver. There are many other types of screen readers available.

Content Linearization

Audio interfaces present content linearly to users, one item at a time. This contrasts with the way in which most people use visual interfaces. Sighted users can scan an entire screen almost instantaneously, comprehending the overall layout, the artistic style, and other macro-level aspects of the content. Screen reader users cannot comprehend these macro-level aspects as quickly. The linear progression through the content from beginning to end is somewhat like automated telephone menu systems which do not reveal all of the options at once. Users must progress through such systems in a step-wise manner. The insight that audio interfaces are linearized versions of web content is an important one that should guide Web developers during engineering and design process.

Skimming Through Content

Despite the linear nature of audio interfaces, there are some ways in which screen reader users can "skim" through the content.

Headings

Another way to skim the page to get an overall impression of a page's content is to jump from heading to heading. Users can hear an outline of the page's main ideas, then backtrack to read the parts they are most interested in. The main drawback to this technique is that too many pages lack headings. Without headings, this method of skimming through content is completely useless.

Implication: Authors should organize content with headings. To the extent possible, the headings should represent an accurate outline of the content.

Landmarks and page sections

Users can navigate via ARIA landmarks and HTML5 sectioning elements, such as <main>, <nav>, <header>, etc.

Implication: Define appropriate ARIA landmarks and use HTML5 elements appropriately.

Paragraphs and page elements

Users can jump from paragraph to paragraph, listening to the first sentence or two before moving on to the next paragraph. This technique is most like the visual skimming technique used by some sighted people. Users can also jump from element to element, such as <div> tags, links, form elements, list items, or other units of content.

Implication: When possible, place the distinguishing information of a paragraph in the first sentence.

Others

In addition to the methods above, screen reader users can also navigate by tables, lists, buttons, forms, links, images, etc.

Implication: Use proper HTML semantic structure with elements marked up appropriately.

Accommodating Differences Between Screen Readers

Screen readers are remarkably similar in their functionality and capabilities, but there are differences between them. Keyboard shortcuts in one screen reader rarely perform the same function in other screen readers. The voices of different screen readers do not sound exactly the same. They also have different ways of notifying users of important information, such as which pieces of text are links, which pieces of content are images, and so on.

A reasonable question to ask at this point is whether designers should worry about the differences between screen readers. If content is accessible to one brand of screen reader, will it be accessible to other brands? Do designers have to customize different versions of content to accommodate the different capabilities and styles of the various screen readers? These questions are both intriguing and worrisome. Developers already have to make sure that their content works well in several versions of several browsers on several platforms. This is enough of a headache without also having to worry about different versions of different screen readers on different platforms.

The good news is that the techniques that work for one screen reader almost always work in other screen readers. In some cases, one of the screen readers has capabilities that the others do not have, or handles some types of content better than the other screen readers. Still, developers are almost always better off when they focus on accessibility standards and generally-accepted accessibility techniques than when they focus on screen reader differences. Focusing on screen reader differences can lead to the undesirable situation of having pages designed just for JAWS or just for Window Eyes, which could potentially exclude users of screen readers for which the page was not optimized. This would be like optimizing pages for a certain browser. Only a few years ago, it was common to see pages that explained "this page best viewed in Internet Explorer" (or Netscape). Fortunately, this practice has become less common, and is widely frowned upon. Both browsers and screen readers have paid more attention to standards over the last few years, so the user experience is quite consistent no matter which technology is used. No two technologies are the same, which leads to occasional design headaches, but they are similar enough that there are fewer exceptions to the "rules" than there once were.

How Screen Readers Read Content

This section presents a list of ways that screen readers generally read and pronounce content. Of course there are differences between screen readers, but this presents general behavior. It is not an exhaustive list, by any means, but it will help developers understand screen readers a little better.

  • Screen readers pause for periods, semi-colons, commas, question marks, and explanation points.
  • Screen readers generally pause at the end of paragraphs.
  • Screen readers try to pronounce acronyms and nonsensical words if they have sufficient vowels/consonants to be pronounceable; otherwise, they spell out the letters. For example, NASA is pronounced as a word, whereas NSF is pronounced as "N. S. F." The acronym URL is pronounced "earl," even though most humans say "U. R. L." The acronym SQL is not pronounced "sequel" by screen readers even though some humans pronounce it that way; screen readers say "S. Q. L."
  • Screen reader users can pause if they didn't understand a word, and go back to listen to it; they can even have the screen reader read words letter by letter. When reading words letter by letter, JAWS distinguishes between upper case and lower case letters by shouting/emphasizing the upper case letters.
  • Screen readers read letters out loud as you type them, but say "star" or "asterisk" for password fields.
  • Screen readers announce the page title (the <title> element in the HTML markup) when first loading a web page.
  • Screen readers will read the alternative text of images, if alt text is present. JAWS precedes the alternative text with the word "graphic." If the image is a link, JAWS precedes the alternative text with "graphic link."
  • Screen readers ignore images without alternative text and say nothing, but users can set their preferences to read the file name.
  • If the image without alternative text is a link, screen readers will generally read the link destination (the href attribute in the HTML markup) or may read the image file name.
  • Screen readers announce headings and identify the heading level. JAWS, for example, precedes <h1> headings with "heading level 1."
  • Some screen readers announce the number of links on a page as soon as the page finishes loading in the browser.
  • JAWS says "same page link" if the link destination is on the same page as the link itself and "visited link" for links that have been previously accessed.
  • Screen readers in table navigation mode inform the user how many rows and columns are in a data table.
  • Users can navigate in any direction from cell to cell in table navigation mode. If the table is marked up correctly, the screen reader will read the column and/or row heading as the user enters each new cell.
  • Screen readers inform users when they have entered into a form. Users have the option to enter form navigation mode.
  • Screen readers with appropriate language settings can switch languages on the fly if a page or part of a page is marked as a different language. For example, if a Spanish phrase appears in an English page, the screen reader can switch to Spanish pronunciation if the phrase is marked as a Spanish phrase: <span lang="es">Viva la patria</span>.
  • Most screen readers pronounce words correctly in almost every instance, but occasionally they misinterpret the difference between homographs (words that are spelled the same but which have different meanings and/or pronunciation). For example, the word read can be pronounced "reed" or "red," depending on the context: "I must read the newspaper " vs. "I have read the newspaper." A sentence such as "I read the newspaper every day" is actually ambiguous to all readers—humans and screen readers alike. It could mean that the writer reads the newspaper every day or that the writer used to read the newspaper every day. Depending on what the writer meant to say, the word read in that sentence could be pronounced either "reed" or "red." The word content is another example: "I feel content " (meaning happy, with the emphasis on the second syllable [con-TENT]) vs. "Skip to main content" (meaning the subject matter, with the emphasis on the first syllable [CON-tent]).
  • Screen readers read most punctuation by default, such as parentheses, dashes, asterisks, and so on, but not all screen readers choose to read the same pieces of punctuation. Some do not read asterisks by default, for example. Periods, commas, and colons are usually not read out loud, but screen readers generally pause after each. Users can set the verbosity setting in their preferences so that screen readers read more or less punctuation.
Note

The trial versions of both JAWS and Window Eyes last almost indefinitely, but can be used for only 40 minutes at a time. After each 40 minute session, users must reboot the computer in order to start another 40 minute session.

JAWS: information and trial version download

Window Eyes: information | Window Eyes: trial version download

Important

Using any screen reader for the first time can be a confusing and discouraging experience. Using an audio interface is almost always a little disorienting for sighted users. Also, much of the content on a web page will seem to be inaccessible, when in fact the problem may be that the new user simply does not know how to use the screen reader. Developers who are serious about wanting to know how their content sounds on screen readers will need to either work closely with people who use screen readers on a regular basis or else devote the time to learn how to use a screen reader effectively.