E-mail List Archives
Thread: Transcript vs. Caption
Number of posts in this thread: 11 (In chronological order)
From: Patrick Burke
Date: Thu, Dec 18 2014 1:03PM
Subject: Transcript vs. Caption
No previous message | Next message →
>Hi all,
We want to doublecheck our understanding of alternate content for
media (Guideline 1.2).
It appears that a transcript ("text alternative for time-based media)
is sufficient if the content is audio-only or video-only, under
section 1.2.1. Otherwise, captions/audio descriptions are necessary
(1.2.2 & others).
Are there multimedia situations ("synchronized media") where a
transcript is an acceptable alternative? It looks like static text is
ok, but only in specialized tech environments (such as a Silverlight window).
We're preparing a report & want to strongly encourage captions,
rather than transcripts. But we are wondering for example about low
vision users who might prefer to work with static text. So we're
trying to figure out if there are situations where WCAG (Techniques)
recommend a transcript as a preferred method for the equivalent
content. Or require *both* transcript & captioning.
Thanks very much for any assistance,
Patrick
--
Patrick J. Burke
Coordinator
UCLA Disabilities &
Computing Program
Phone: 310 206-6004
E-mail: = EMAIL ADDRESS REMOVED =
Location: 4909 Math Science
Department Contact: = EMAIL ADDRESS REMOVED =
From: John Foliot
Date: Thu, Dec 18 2014 1:42PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
Patrick Burke wrote:
>
> We want to doublecheck our understanding of alternate content for media
> (Guideline 1.2).
>
> It appears that a transcript ("text alternative for time-based media)
> is sufficient if the content is audio-only or video-only, under section
> 1.2.1. Otherwise, captions/audio descriptions are necessary
> (1.2.2 & others).
First, you need to determine what compliance level you are going for. If it
is AA then you will need 3 "alternatives" - Captions, Transcript and Audio
Descriptions(*).
The easiest way to think of this is via user-groups.
For the deaf and hard-of-hearing communities, captions are the necessary
accommodation (SC 1.2.2 - Level A)
For the blind and low-vision communities, the transcript can provide the
necessary accommodation, especially if the transcript combines both the
dialog and necessary explanation of what is on screen (SC 1.2.3 - Level a),
where the "transcript" serves as the Media Alternative called for:
"1.2.3 Audio Description or Media Alternative (Prerecorded): An
alternative for time-based media or audio description of the prerecorded
video content is provided for synchronized media, except when the media is a
media alternative for text and is clearly labeled as such. (Level A)"
For deaf/blind, or users with cognitive issues, then the Transcript (think
in terms of a screen play) will be their accommodation requirement.
WCAG recognizes that at Level A conformance, that Transcript can also serve
as an accommodation to the Audio-Description requirements, especially since
in practice most transcripts already serve a functionally similar equivalent
already.
>
> We're preparing a report & want to strongly encourage captions, rather
> than transcripts.
Actually, you require both at a minimum, and if you are going for AA
conformance you also need "Audio Descriptions":
"1.2.5 Audio Description (Prerecorded): Audio description is
provided for all prerecorded video content in synchronized media. (Level
AA)"
(* In my opinion this is now an unfortunate choice of wording, based upon
old-world tech - traditionally televisions and film - where the audio
description is tightly bound to the evolving timeline of the media
presentation. We have seen however PoC examples of providing the description
of on-screen activity and related important visual information as text files
that can be voiced by TTS engines, and that allow, for example, the ability
to speed up the 'voice' to be more in sync with what we know most daily
Screen reader users are accustomed to (e.g. 200+ words per minute). This of
course allows you to "cram" more information into the 'silence' between the
on-screen dialog (always a tricky requirement to meet, and one of the
reasons why traditional audio description is so hard at a professional
level).
I have also seen a PoC that used popcorn.js to actually 'pause' the
pre-recorded media stream and render the "descriptive text" on screen, for
further end-user processing.
We are currently putting the finishing touches on a new W3C Note "Media
Accessibility User Requirements"
(http://w3c.github.io/pfwg/media-accessibility-reqs), which we hope will be
finalized and published early in the new year (January?? - seriously, that
close). This document recognizes this new means of providing video
descriptions(http://w3c.github.io/pfwg/media-accessibility-reqs/#text-video-
description), but at this writing it is unclear whether or not WCAG will
move to accept text files as a functional replacement for "audio
description" (which is what WCAG explicitly requires)
At any rate Patrick, to be Level AA conformant, you will need all three.
Recognizing both the financial, production and technical limitations
inherent to this requirement (in plain language, this is *Really* hard to
accomplish today) the Canadian Federal Government has specifically issued a
'exemption' for this requirement at this time (with some limitations), with
a stated goal of revisiting this annually to assess the feasibility moving
forward (FWIW). See here: http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=23601
Appendix B.
HTH
JF
------------------------------
John Foliot
Web Accessibility Specialist
W3C Invited Expert - Accessibility
HTML5-a11y Task Force (Media SubTeam)
Co-Founder, Open Web Camp
From: Andrew Kirkpatrick
Date: Thu, Dec 18 2014 2:32PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
John,
The scenario that I don't think you are covering is if the media is just Audio (or just video). In the case of audio-only (SC 1.2.1) then a transcript by itself is just fine, and may in fact be preferable in that it gives the user the ability to read the content of the audio at their own pace.
AWK
From: John Foliot
Date: Thu, Dec 18 2014 2:56PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
Andrew Kirkpatrick wrote:
>
> John,
> The scenario that I don't think you are covering is if the media is
> just Audio (or just video). In the case of audio-only (SC 1.2.1) then
> a transcript by itself is just fine, and may in fact be preferable in
> that it gives the user the ability to read the content of the audio at
> their own pace.
Yes, sorry in the case of Audio only you are correct.
While I suppose that a video without audio could exist, I think that is
something of a narrow corner-case. In that scenario I believe that the
requirement for 'audio description' remains, and for AA conformance that it
is still an "above and beyond the transcript requirement" (for a strict
reading of WCAG from this end at least). The need for a caption however
would not exist, as the video has no dialog or other important sounds.
Andrew, any thoughts regarding 'video descriptions offered as text' as an
alternative to meeting SC 1.2.5? I've been meaning to follow up on that with
you for some time now :-) - it's tricky because it also presumes that the
non-sighted user (the primary candidate for this SC) can process on-screen
text independent of the media player - a huge assumption to be making I
grant you in advance. Yet I also go back to the PoC that IBM Japan showed at
CSUN 2012, where they built the TTS engine into a standalone player, so...
(Also, if this is too large a discussion for this forum I understand.)
Finally, just in case someone following along is not already aware of this
resource, be sure to check out http://www.dcmp.org/captioningkey/ for great
captioning advice. Also be sure to check out
http://www.icam.k12.in.us/index.php?option=com_content&view=article&id=142&I
temid=83 for audio description resources.
JF
From: John Foliot
Date: Thu, Dec 18 2014 3:07PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
Whoops, I've been sleeping at the switch.
It seems that providing a time-stamped text file for 'descriptions' has
already been added to Techniques:
http://www.w3.org/TR/2014/NOTE-WCAG20-TECHS-20140916/H96
That said however, I'm not sure if that is a workable solution today - it
meets the technical requirement, but I'm not sure it can actually be
actuated successfully - is anyone aware of a working example? (I'm not, but
would love to see one.)
"The user agent makes the cues available to the user in a non-visual
fashion, for instance, by synthesizing them into speech."
Missing in the equation is a) where the description is rendered, and b) the
"handoff" to a TTS engine for non-sighted users. It seems kind of pointless
to provide a text description of what is happening on-screen and not have
the intended audience be able to access that information...
JF
From: Loretta Guarino Reid
Date: Thu, Dec 18 2014 3:32PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
Several years ago, my summer intern implemented and released 2 Chrome
extensions
for this type of functionality, using WebVTT:
* HTML5 Audio Description (via text to speech)
* HTML5 Audio Description (via screenreader)
Anyone can search for them in the Chrome extensions, install and activate.
The web site for the extension describes what is needed to make them work.
An example of a video containing an embedded audio description track that
would work with these extensions:
html5videoguide.net/demos/AuDesc_2012/
On Thu, Dec 18, 2014 at 2:07 PM, John Foliot < = EMAIL ADDRESS REMOVED = > wrote:
>
> Whoops, I've been sleeping at the switch.
>
> It seems that providing a time-stamped text file for 'descriptions' has
> already been added to Techniques:
> http://www.w3.org/TR/2014/NOTE-WCAG20-TECHS-20140916/H96
>
> That said however, I'm not sure if that is a workable solution today - it
> meets the technical requirement, but I'm not sure it can actually be
> actuated successfully - is anyone aware of a working example? (I'm not, but
> would love to see one.)
>
> "The user agent makes the cues available to the user in a
> non-visual
> fashion, for instance, by synthesizing them into speech."
>
> Missing in the equation is a) where the description is rendered, and b) the
> "handoff" to a TTS engine for non-sighted users. It seems kind of pointless
> to provide a text description of what is happening on-screen and not have
> the intended audience be able to access that information...
>
> JF
>
>
> > > >
From: Jonathan Avila
Date: Thu, Dec 18 2014 6:15PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
> First, you need to determine what compliance level you are going for. If it is AA then you will need 3 "alternatives" - Captions, Transcript and Audio Descriptions(*).
John, I'm trying to figure out what success criteria you are saying would require transcripts for WCAG Level AA multimedia conformance if audio description was available. My read is that SC 1.2.3 and SC 1.2.5 can both be met via audio description. SC 1.2.2 and 1.2.4 can be met through captions. The requirement for transcripts when audio description and captions is present is something that doesn't appear to be clear.
Jonathan
From: John Foliot
Date: Thu, Dec 18 2014 7:26PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
Jonathan Avila wrote:
>
> John, I'm trying to figure out what success criteria you are saying
> would require transcripts for WCAG Level AA multimedia conformance if
> audio description was available. My read is that SC 1.2.3 and SC
> 1.2.5 can both be met via audio description. SC 1.2.2 and 1.2.4 can be
> met through captions. The requirement for transcripts when audio
> description and captions is present is something that doesn't appear to
> be clear.
Hi Jonathan,
So... technically yes, I suppose you could get by ('legally speaking') with
audio description and captions only; however from the Intent of 1.2.3 comes
the following:
"The alternative for time-based media reads something like a
screenplay or book. Unlike audio description, the description of the video
portion is not constrained to just the pauses in the existing dialogue. Full
descriptions are provided of all visual information, including visual
context, actions and expressions of actors, and any other visual material.
In addition, non-speech sounds (laughter, off-screen voices, etc.) are
described, and transcripts of all dialogue are included. The sequence of
description and dialogue transcripts are the same as the sequence in the
synchronized media itself. As a result, the alternative for time-based media
can provide a much more complete representation of the synchronized media
content than audio description alone."
(source:
http://www.w3.org/TR/UNDERSTANDING-WCAG20/media-equiv-audio-desc.html)
I'll suggest here, although concur that it is NOT a formal AA requirement,
that providing a Transcript also benefits the deaf/blind user, as well as
users with cognition issues.
Old dogs like me will remember there was a lot of concern about releasing
WCAG 2.0 because there was very little there for that particular user-group
- thankfully there is some work afoot now (the COGA Task Force at W3C being
a major example) to shore up those holes. I've been very close to media
accessibility for a while now, and I've always seen the Transcript as a MUST
requirement in my mind, although as you note, it currently isn't. I guess
it's the difference between meeting the minimum compliance bar, and "doing
the right thing" kind of argument...
From a production standpoint however, I'd further argue that once you've got
the dialog converted to text (for the captions) and the descriptions as text
(suggested by Technique H96: Using the track element to provide audio
descriptions), that generating a transcript would likely not require a whole
lot more effort (Captions + Descriptions + a tiny bit more if required =
Transcript).
Today, I am primarily more concerned about meeting AA for multi-media
overall, as my suspicion is that many folks really don't fully understand
what is required for the audio description part, given that it seems simply
getting captions provided is still an apparent uphill battle for many. The
complexity of creating the descriptions (the script), getting it recorded,
and then synchronizing those audio descriptions with the primary media asset
so that one voice doesn't "step" on the other... add in the lack of any form
of native support in user agents... and, we just don't seem to be there
quite yet technically; yet there it is, a AA Level conformance requirement.
I was unaware of the Chrome extensions that Loretta pointed out (off to
check them out next), and so that shows promise, however having support in
only one browser (via user-installed plugins) feels quite brittle to me
today - is all.
JF
From: Andrew Kirkpatrick
Date: Thu, Dec 18 2014 7:52PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
The full alternative for time-based media (transcript with cc and ad information) kicks in with 1.2.8 at AAA.
Triple A or single A, I do agree that deaf-blind users will find a resource like this useful, if not necessary.
AWK
> John, I'm trying to figure out what success criteria you are saying
> would require transcripts for WCAG Level AA multimedia conformance if
> audio description was available. My read is that SC 1.2.3 and SC
> 1.2.5 can both be met via audio description. SC 1.2.2 and 1.2.4 can
> be met through captions. The requirement for transcripts when audio
> description and captions is present is something that doesn't appear
> to be clear.
So... technically yes, I suppose you could get by ('legally speaking') with audio description and captions only; however from the Intent of 1.2.3 comes the following:
"The alternative for time-based media reads something like a screenplay or book. Unlike audio description, the description of the video portion is not constrained to just the pauses in the existing dialogue. Full descriptions are provided of all visual information, including visual context, actions and expressions of actors, and any other visual material.
In addition, non-speech sounds (laughter, off-screen voices, etc.) are described, and transcripts of all dialogue are included. The sequence of description and dialogue transcripts are the same as the sequence in the synchronized media itself. As a result, the alternative for time-based media can provide a much more complete representation of the synchronized media content than audio description alone."
(source:
http://www.w3.org/TR/UNDERSTANDING-WCAG20/media-equiv-audio-desc.html)
I'll suggest here, although concur that it is NOT a formal AA requirement, that providing a Transcript also benefits the deaf/blind user, as well as users with cognition issues.
Old dogs like me will remember there was a lot of concern about releasing WCAG 2.0 because there was very little there for that particular user-group
- thankfully there is some work afoot now (the COGA Task Force at W3C being a major example) to shore up those holes. I've been very close to media accessibility for a while now, and I've always seen the Transcript as a MUST requirement in my mind, although as you note, it currently isn't. I guess it's the difference between meeting the minimum compliance bar, and "doing the right thing" kind of argument...
From a production standpoint however, I'd further argue that once you've got the dialog converted to text (for the captions) and the descriptions as text (suggested by Technique H96: Using the track element to provide audio descriptions), that generating a transcript would likely not require a whole lot more effort (Captions + Descriptions + a tiny bit more if required = Transcript).
Today, I am primarily more concerned about meeting AA for multi-media overall, as my suspicion is that many folks really don't fully understand what is required for the audio description part, given that it seems simply getting captions provided is still an apparent uphill battle for many. The complexity of creating the descriptions (the script), getting it recorded, and then synchronizing those audio descriptions with the primary media asset so that one voice doesn't "step" on the other... add in the lack of any form of native support in user agents... and, we just don't seem to be there quite yet technically; yet there it is, a AA Level conformance requirement.
I was unaware of the Chrome extensions that Loretta pointed out (off to check them out next), and so that shows promise, however having support in only one browser (via user-installed plugins) feels quite brittle to me today - is all.
JF
From: Jonathan Avila
Date: Thu, Dec 18 2014 8:05PM
Subject: Re: Transcript vs. Caption
← Previous message | Next message →
[John wrote] Full descriptions are provided of all visual information, including visual context, actions and expressions of actors, and any other visual material.
So, if you take the stance of full audio description being a hard requirement and the fact that most multimedia will not have pauses for then from what I've read and seen discussed before SC 1.2.5 AA cannot be met on a technicality of the definition of audio description. So you could run into a situation where you meeting SC 1.2.7 extended audio description and you could meet 1.2.3 with a transcript but you could not meet SC 1.2.5. I do see that the WCAG working group as g8 Extended audio description as a sufficient technique for SC 1.2.5 -- so I hope that we can all agree that an extended audio description would meet SC 1.2.5 despite the definition of audio description in 1.2.5 implying the AD should fit in to the pauses.
> I'll suggest here, although concur that it is NOT a formal AA requirement, that providing a Transcript also benefits the deaf/blind user, as well as users with cognition issues.
I concur.
Jonathan
From: John Foliot
Date: Thu, Dec 18 2014 9:12PM
Subject: Re: Transcript vs. Caption
← Previous message | No next message
Jonathan Avila wrote:
>
> So, if you take the stance of full audio description being a hard
> requirement and the fact that most multimedia will not have pauses for
> then from what I've read and seen discussed before SC 1.2.5 AA cannot
> be met on a technicality of the definition of audio description.
Not exactly. Professional movie studios (Pixar of particular note) are doing
just that now - but as I noted, it is a costly and tricky process, and far
from perfect (there are still the occasional collision of audio tracks). I
recall once chatting with an MTV executive about this, and he indicated to
me that it did add a non-insignificant amount to the final cost of
production. I also remember a particular conversation with a blind colleague
who noted that when he watched Shrek, the audio descriptionist voice was
chosen to "blend" into the other cartoon-like voices, for a more seamless
integration. (That sounded way-cool to me at the time!) His point however
was that while providing the script as text (for a TTS engine to render)
that it needed to be evaluated towards the total presentation: in the case
of Shrek, if it relied on the 'stock' synthesized voice (cranked to 200+
words per minute) it would have intruded into the movie experience, as
opposed to augmenting it. Just something to contemplate on...
With the ease of posting videos on the internet today however, the ability
to fulfill this requirement will be (I conjecture) difficult for all but the
largest shops.
This is why I am both concerned about the requirement for "audio"
descriptions (as providing the text 'script' is significantly easier to
accomplish) along with the fact that outside of the Chrome plugins Loretta
referenced, the actual tools to successfully deliver on the SC are sparse to
non-existent today, making achieving 1.2.5 (AA conformance) extremely
difficult. Not impossible, but without the technical support at the end-user
level, it will require that the content producer fill in the missing User
Agent/AT gaps.
> So
> you could run into a situation where you meeting SC 1.2.7 extended
> audio description and you could meet 1.2.3 with a transcript but you
> could not meet SC 1.2.5. I do see that the WCAG working group as g8
> Extended audio description as a sufficient technique for SC 1.2.5 -- so
> I hope that we can all agree that an extended audio description would
> meet SC 1.2.5 despite the definition of audio description in 1.2.5
> implying the AD should fit in to the pauses.
Well, if you use the "must fit into the pauses" as the deciding criteria, I
suppose that yes, you've found the loophole (while noting that in fact, that
seems to be the distinction in WCAG today). When we were working on the MAUR
(pronounced Mow-er) - The Media Accessibility User Requirements - we
actually envisioned the need for the end user to be able to pause or "shift"
the primary stream long enough to listen to/process both the video
description and/or extended description. See:
http://w3c.github.io/pfwg/media-accessibility-reqs/#time-scale-modification
Cheers!
JF