Captions, Transcripts, and Audio Descriptions


Captions are text versions of the spoken word presented within multimedia. Captions allow the content of web audio and video to be accessible to those who do not have access to audio. Though captioning is primarily intended for those who cannot hear the audio, it has also been found to help those that can hear audio content, those who may not be fluent in the language in which the audio is presented, those for whom the language spoken is not their primary language, etc.

Common web accessibility guidelines indicate that captions should be:

  • Synchronized - the text content should appear at approximately the same time that audio would be available
  • Equivalent - content provided in captions should be equivalent to that of the spoken word
  • Accessible - caption content should be readily accessible and available to those who need it

On the web, synchronized, equivalent captions should be provided any time multimedia content (generally meaning both visual and auditory content) is present. This obviously pertains to the use of audio and video played through multimedia players and HTML5 video, but can also pertain to such technologies as Flash or Java when audio content is a part of the multimedia presentation.

Captions as typically seen on television
Screenshot of black and white news footage showing battleships in the distance. Captions display on the image which read - the curtain rises on the greatest military experiment ever undertaken.

Captions can be either closed or open. Closed captions can be turned on or off, whereas open captions are always visible.

All television sets with screen sizes of 13 inches and larger must contain the hardware to interpret and display closed captions. Closed captioning of most pre-recorded television programs is now a legal requirement in the United States. Television closed captioning is used by millions of individuals who are deaf or hard of hearing; millions more use it in the classroom or in noisy environments—like bars, restaurants, and airports. As the average age of the population increases, so does the number of people with hearing impairments. According to US government figures, one person in five has some functional hearing limitation. Because of the growing need for access to captions, many live broadcasts (such as news and sports events) and most pre-recorded programs now include closed captions that can be easily enabled and viewed on screen.

Captions as seen on DVD
Screenshot from movie The Grinch. A girl holding christmas boxes. Captions read - Can't you feel it? - Merry Christmas!

Closed captions for television are very limited in their formatting, because the caption look, feel, and location are determined by the caption decoder built into the television set. You can get more information about television captioning at Captioning FAQ.

Captions as seen in a web media player
Screenshot of captions in a web media player

Open captions are similar to, and include the same text, as closed captions, but the captions are a permanent part of the video picture, and cannot typically be turned off. Open captions are not decoded by the television set, but are a part of the video information. This typically requires a video editing or encoding program that allows you to overlay titles onto the video. The captions are visible to anybody viewing the video clip and cannot be turned off. This gives you total control over the way the captions appear, but can be very time consuming and expensive to produce. This technique allows for more control over caption location, size, color, font, and timing.

For web video, captions can be open, closed, or both. Closed captions are most common, utilizing functionality within video players and browsers to display closed captions on top of or immediately below the video area.

The most common forms of web multimedia - Flash and HTML5 Video - both support captioning. Older technologies, such a Windows Media Player, QuickTime, and RealPlayer also support captioning. The formats and techniques for authoring and implementing captions may vary based on the technology used.


Also see our article on real-time captions for information on captioning live web multimedia and broadcasts.


Transcripts also provide an important part of making web multimedia content accessible. Transcripts allow anyone that cannot access content from web audio or video to read a text transcript instead. Transcripts do not have to be verbatim accounts of the spoken word in a video. They should contain additional descriptions, explanations, or comments that may be beneficial, such as indications of laughter or an explosion. Transcripts allow deaf/blind users to get content through the use of refreshable Braille and other devices. For most web video, both captions and a text transcript should be provided. For content that is audio only, a transcript will usually suffice.

Transcripts provide a textual version of the content that can be accessed by anyone. They also allow the content of your multimedia to be searchable, both by computers (such as search engines) and by end users. Screen reader users may also prefer the transcript over listening to the audio of the web multimedia. Most proficient screen reader users set their assistive technology to read at a rate much faster than most humans speak. This allows the screen reader user to access the transcript of the video and get the same content in less time than listening to the actual audio content.


In order to be fully accessible to the maximum number of users, web multimedia should include both synchronized captions AND a descriptive transcript.