Document and Content Language
The Importance of Identifying Language
Screen readers can "speak" various languages—as long the content language is identified. If the screen reader does not support or cannot speak the defined language, the user might be informed of the content language, even if that content cannot be properly read.
Defining the document language also supports automated translation of content using tools like Google Translate.
For Level A conformance with the Web Content Accessibility Guidelines (WCAG) the document language must be programmatically defined. For Level AA WCAG conformance the language of parts of a page in a language different than the rest of the page must also be identified. This tells the screen reader to switch to that language (if it is able).
Specifying the "language of parts" of the page is only necessary for other-language content that is not generally understood in the document's primary language. "Los Angeles" and "piñata", for example, are Spanish words that are understood by English readers, so it would not be necessary to identify these as being Spanish on an English web page.
Properly defining the content language also allows the browser to properly display quotation marks for various languages when using the
<q> element. The following examples are defined as German and French. The browser has generated the localized quotation marks appropriate to the language.
Mein Computer spricht Deutsch.
Mon ordinateur parle français.
Additionally, if the language is specified the browser can present:
- The appropriate characters for non-Latin text
- Localized date and time inputs (such as using MM/DD/YYYY vs. DD/MM/YYYY or 24-hour time vs. AM/PM time)
- Numbers with appropriate comma or period thousands separators
- Proper-language spellchecking for inputs
lang attribute is used to identify the language of the web page. This attribute must always be added to the
<html> tag. It is given a value that identifies the natural language of the page. Adding
<html lang="en">, for example, would specify that the page is in English.
lang attribute can be added to other HTML elements within a page to indicate their natural language.
<p lang="ja">, for example, would indicate Japanese as the language for the paragraph.
Do not use the
lang attribute to specify the language of content that is being linked or navigated to. If a link on an English web page to a Spanish translation presents text of "Spanish", the
lang attribute is not used because "Spanish" is an English word. If the link instead presents text of "Español", then
lang="es" should be defined on the link.
When text in one language is read with the pronunciation rules of another, the results can make the content inaccessible. Below is a passage of text in English. The audio recording is a screen reader pronouncing this text as if it had
Most people today can hardly conceive of life without the Internet. Some have argued that no other single invention has been more revolutionary since Gutenberg’s printing press in the 1400s. Now, at the click of a mouse, the world can be “at your fingertips”—that is, if you can use a mouse... and see the screen... and hear the audio—in other words, if you don't have a disability of any kind.
Identifying the document language is also important for Acrobat PDF files. The document language can be specified in Acrobat Professional or other PDF editing software.
Screen Reader Support
lang attribute values are usually adequate for screen reader support. Support for three-character and extended, script, and region subtags varies based on the browser and screen reader in use, and the language voices that are supported and installed. When in doubt, test. Support for inline language changes, such as for a
<img> element is also varied. When possible it is best to define the
lang attribute on a block level element, such as a
<blockquote>, or similar.
To read the content in the defined language, the screen reader must support that language. All modern screen readers have support for numerous languages. In some screen reader the user must manually install or configure language voices or "language packs".
If a screen reader encounters a
lang attribute which specifies a language for which a matching language voice is not installed or supported, it will usually identify the language of the content. The screen reader might pronounce "Spanish", for example, for content with
lang="es" if a Spanish language voice is not enabled or installed.
Screen readers will typically attempt to read content that is pronounceable, even if the defined language is not supported. Polish content, for example, is written in Latin characters, so will read by the screen reader with an English default voice (though it will be read without proper pronunciation, inflections, etc.—perhaps sounding like a beginner Polish class). Chinese characters, on the other hand, are not directly pronounceable in English, so the screen reader would not read them, though it may announce "Chinese" to inform the user that Chinese language content is present.