Captions are text versions of the spoken word presented within multimedia. Captions allow the content of web audio and video to be accessible to those who do not have access to audio. Though captioning is primarily intended for those who cannot hear the audio, it has also been found to help those that can hear audio content, those who may not be fluent in the language in which the audio is presented, those for whom the language spoken is not their primary language, etc.
Common web accessibility guidelines indicate that captions should be:
- Synchronized - the text content should appear at approximately the same time that audio would be available
- Equivalent - content provided in captions should be equivalent to that of the spoken word
- Accessible - caption content should be readily accessible and available to those who need it
On the web, synchronized, equivalent captions should be provided any time multimedia content (generally meaning both visual and auditory content) is present. This obviously pertains to the use of audio and video played through multimedia players and HTML5 video, but can also pertain to such technologies as Flash or Java when audio content is a part of the multimedia presentation.
Captions as typically seen on television
Captions can be either closed or open. Closed captions can be turned on or off, whereas open captions are always visible.
All television sets with screen sizes of 13 inches and larger must contain the hardware to interpret and display closed captions. Closed captioning of most pre-recorded television programs is now a legal requirement in the United States. Television closed captioning is used by millions of individuals who are deaf or hard of hearing; millions more use it in the classroom or in noisy environments—like bars, restaurants, and airports. As the average age of the population increases, so does the number of people with hearing impairments. According to US government figures, one person in five has some functional hearing limitation. Because of the growing need for access to captions, many live broadcasts (such as news and sports events) and most pre-recorded programs now include closed captions that can be easily enabled and viewed on screen.
Captions as seen on DVD
Closed captions for television are very limited in their formatting, because the caption look, feel, and location are determined by the caption decoder built into the television set. You can get more information about television captioning at Captioning FAQ.
Captions as seen in a web media player
Open captions are similar to, and include the same text, as closed captions, but the captions are a permanent part of the video picture, and cannot typically be turned off. Open captions are not decoded by the television set, but are a part of the video information. This typically requires a video editing or encoding program that allows you to overlay titles onto the video. The captions are visible to anybody viewing the video clip and cannot be turned off. This gives you total control over the way the captions appear, but can be very time consuming and expensive to produce. This technique allows for more control over caption location, size, color, font, and timing.
For web video, captions can be open, closed, or both. Closed captions are most common, utilizing functionality within video players and browsers to display closed captions on top of or immediately below the video area.
The most common forms of web multimedia - Flash and HTML5 Video - both support captioning. Older technologies, such a Windows Media Player, QuickTime, and RealPlayer also support captioning. The formats and techniques for authoring and implementing captions may vary based on the technology used.
Transcripts also provide an important part of making web multimedia content accessible. Transcripts allow anyone that cannot access content from web audio or video to read a text transcript instead. Transcripts do not have to be verbatim accounts of the spoken word in a video. They should contain additional descriptions, explanations, or comments that may be beneficial, such as indications of laughter or an explosion. Transcripts allow deaf/blind users to get content through the use of refreshable Braille and other devices. For most web video, both captions and a text transcript should be provided. For content that is audio only, a transcript will usually suffice.
Transcripts provide a textual version of the content that can be accessed by anyone. They also allow the content of your multimedia to be searchable, both by computers (such as search engines) and by end users. Screen reader users may also prefer the transcript over listening to the audio of the web multimedia. Most proficient screen reader users set their assistive technology to read at a rate much faster than most humans speak. This allows the screen reader user to access the transcript of the video and get the same content in less time than listening to the actual audio content.
In order to be fully accessible to the maximum number of users, web multimedia should include both synchronized captions AND a descriptive transcript.
Audio descriptions are intended for users with visual disabilities. They provide additional information about what is visible on the screen. This allows video content to be accessible to those with visual disabilities. Though not commonly utilized in television and movies, it is gaining in popularity. Audio descriptions are helpful on the web if visual content in web video provides important content not available through the audio alone. An example of audio descriptions for something you have probably seen and heard is found below. Can you visualize what is being described?
Listen to Audio Descriptions in MP3 Format (152KB)
If web video is produced with accessibility in mind, then audio descriptions are often unnecessary, as long as visual elements within the video are described in the audio.
Producing audio descriptions can be expensive and time-consuming. When producing a video for the web, the need for audio descriptions can often be avoided. If the video were displaying a list of five important items, the narrator might say, "As you can see, there are five important points." In this case, audio descriptions would be necessary to provide the visual content to those with visual disabilities who cannot 'see' what the important points are. However, if the narrator says, "There are five important points. They are..." and then reads or describes each of the points, then the visual content is being conveyed through audio and there is no additional need for audio descriptions.