Captions, Transcripts, and Audio Descriptions

You are here: Home > Articles > Captions, Transcripts, and Audio Descriptions

Captions

Accessible multimedia (visual and auditory content that is synchronized) must include captions—text versions of speech and other important audio content—allowing it to be accessible to people who can't hear all of the audio.

According to US government figures, one person in eight has some functional hearing limitation, and this number will increase as the average age of the population increases. Beyond people with disabilities, captioning helps people who only partially understand the language presented. Captions are also useful in noisy environments like airports, in quiet environments like libraries, and for multimodal learning.

All multimedia content with speech should have accessible captions that are:

Synchronized to appear at approximately the same time as the corresponding audio.
Equivalent to the spoken words and other audio information.
Accessible, or readily available, to those who need it.

Captions as typically seen on television
Screenshot of The Tonight Show Starring Jimmy Fallon television broadcast. Captions display on the image.

The most common type of captions are "Closed" captions, which can be turned on or off. Most countries require most pre-recorded and live television programs to be closed-captioned.

Closed captioning of most pre-recorded television programs is now a legal requirement in most countries. Most live broadcasts (such as news and sports events) and most pre-recorded programs now include closed captions that can be easily enabled and viewed on screen.

Captions as seen on DVD or Blu-ray
Screenshot from movie Avatar. Captions appear on screen - You crossed the line.

On broadcast television, the style and location of the captions depend on the caption decoder built into the viewer's television receiver or streaming device. In online or streaming video, the browser or video player determines how captions will be displayed. Many decoders and video players allow the user to customize caption size, color, font, and location on the screen.

Captions as seen in a web media player
Screenshot of captions in a web media player

Open captions include the same content as closed captions, but the captions are a permanent part of the video picture and cannot be turned off. The captions are visible to anybody viewing the video clip. This gives the media producer total control (and the user no control) over the way the captions appear, including caption location, size, color, font, and timing.

Note

Also see our article on real-time captions for information on captioning live web multimedia and broadcasts.

Transcripts

For multimedia, a transcript can also help users who can neither hear the audio nor see the video. Beyond the spoken words, a transcript should include descriptions of important audio information (like laughter) and visual information (such as someone entering the room). Transcripts help deaf/blind users interact with content using refreshable Braille devices.

Transcripts also allow anyone that cannot access content from either web audio or video (or both) to read a text transcript instead. For most web video, both captions and a text transcript should be provided. For content that is audio only, a transcript will usually suffice—captions are not necessary for audio-only media like a podcast.

Transcripts make multimedia content searchable by search engines and users. Screen reader users also may also prefer a transcript over real-time audio, since most proficient screen reader users set their assistive technology to read at a rate much faster than natural human speech.

Important

In order to be optimally accessible to users with auditory disabilities, web multimedia should include both synchronized captions and a transcript.

Audio Descriptions

Important

Visual content within multimedia must be described via audio in order for the multimedia to be optimally accessible to users with visual disabilities.

Audio descriptions help users with visual disabilities perceive content that is presented only visually, and are necessary for WCAG 2 Level AA conformance. On television, this is often called Descriptive Video Service (DVS). Typically, a narrator describes the visual-only content in the multimedia. Audio descriptions can be provided with the primary video, or in another audio track, or via an alternate version of the video that includes audio descriptions.

Here's a short example of an audio description that you might recognize. Can you visualize what is being described?

Producing audio descriptions can be expensive and time-consuming. However, they are unnecessary if the audio already presents the necessary visual content. If a video displays a list of five important items, the items should be read aloud instead of the audio presenting, "As you can see, there are five important points". Instead of, "Click here and then here," the presenter should describe what is being clicked. This way, no separate audio description track is necessary.