Captions, Transcripts, and Audio Descriptions
Transcripts
For multimedia, a transcript can also help users who can neither hear the audio nor see the video. Beyond the spoken words, a transcript should include descriptions of important audio information (like laughter) and visual information (such as someone entering the room). Transcripts help deaf/blind users interact with content using refreshable Braille devices.
Transcripts also allow anyone that cannot access content from either web audio or video (or both) to read a text transcript instead. For most web video, both captions and a text transcript should be provided. For content that is audio only, a transcript will usually suffice—captions are not necessary for audio-only media like a podcast.
Transcripts make multimedia content searchable by search engines and users. Screen reader users also may also prefer a transcript over real-time audio, since most proficient screen reader users set their assistive technology to read at a rate much faster than natural human speech.
In order to be optimally accessible to users with auditory disabilities, web multimedia should include both synchronized captions and a transcript.
Audio Descriptions
Visual content within multimedia must be described via audio in order for the multimedia to be optimally accessible to users with visual disabilities.
Audio descriptions help users with visual disabilities perceive content that is presented only visually, and are necessary for WCAG 2 Level AA conformance. On television, this is often called Descriptive Video Service (DVS). Typically, a narrator describes the visual-only content in the multimedia. Audio descriptions can be provided with the primary video, or in another audio track, or via an alternate version of the video that includes audio descriptions.
Here's a short example of an audio description that you might recognize. Can you visualize what is being described?
Producing audio descriptions can be expensive and time-consuming. However, they are unnecessary if the audio already presents the necessary visual content. If a video displays a list of five important items, the items should be read aloud instead of the audio presenting, "As you can see, there are five important points". Instead of, "Click here and then here," the presenter should describe what is being clicked. This way, no separate audio description track is necessary.