E-mail List Archives
Thread: PDF Container tags
Number of posts in this thread: 9 (In chronological order)
From: Chagnon | PubCom.com
Date: Mon, Sep 28 2015 1:56PM
Subject: PDF Container tags
No previous message | Next message →
This issue comes up quite frequently in our work.
People have hissy fits about the common container tags that become embedded
in PDF tag trees when a PDF is made from InDesign, Word, and other office
software. Everyone has a different take on their purpose, meaning, and
requirements. We're trying to clarify this issue for a student's work.
Questions (and links to reference material follows):
The defined container tags in the Adobe PDF standard are <DOC>, <PART>,
<ART>, <SECT>, and <DIV>. Their definitions are loosely defined in the
Acrobat PDF Standards 3200_2008 (see table 333 beginning on page 583
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.
pdf ). I say "loosely defined" because the only one that is adequately
defined is <DOC> which is the root element of the tag structure. Everything
else falls within it. All the other definitions could be debated from now
until the cows come home.
1. Are any of these container tags recognized by today's screen
readers and other AT? The last time we checked (last spring), they were
ignored by screen readers and the PDF tags were read top-to-bottom down the
tag tree regardless of whether there were container tags here and there or
not. Is this still the case?
2. Does it matter if the <DOC> tag is there in the PDF tag tree?
3. From the user's point of view, is there any proposed purpose for
these container tags, now or in the future?
4. And what about <SPAN> tags, do they still interfere with screen
readers and AT?
NOTE: I know these tags can have some purpose for those who create PDFs, but
I'm questioning their purpose by AT.
We couldn't find any references to these container tags when we searched the
PDF/UA standards.
We can't find any references to their correct usage in WCAG, either.
And what happened to the search utility on the WAI website?
http://www.w3.org/WAI/ It's now so difficult to find information there.
Thanks in advance,
--Bevi Chagnon
From: Ryan E. Benson
Date: Mon, Sep 28 2015 3:27PM
Subject: Re: PDF Container tags
← Previous message | Next message →
Hi Bevi,
> Are any of these container tags recognized by today's screen
readers and other AT?
To my knowledge there is not way to navigate like you can with ARIA regions
at this time.
> Does it matter if the <DOC> tag is there in the PDF tag tree?
DOC isn't a standard tag, so it should be mapped to Document. If not,
custom tags are mapped to P if not defined in Acrobat, so the various
structures could essentially be ignored. As for having a <Document> it
comes down how much of a purist you are. Not having one will not break the
document unlike leaving out <html> and <body> in HTML.
--
Ryan E. Benson
On Mon, Sep 28, 2015 at 3:56 PM, Chagnon | PubCom.com < = EMAIL ADDRESS REMOVED = >
wrote:
> This issue comes up quite frequently in our work.
>
> People have hissy fits about the common container tags that become embedded
> in PDF tag trees when a PDF is made from InDesign, Word, and other office
> software. Everyone has a different take on their purpose, meaning, and
> requirements. We're trying to clarify this issue for a student's work.
>
>
>
> Questions (and links to reference material follows):
>
>
>
> The defined container tags in the Adobe PDF standard are <DOC>, <PART>,
> <ART>, <SECT>, and <DIV>. Their definitions are loosely defined in the
> Acrobat PDF Standards 3200_2008 (see table 333 beginning on page 583
>
> http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.
> pdf ). I say "loosely defined" because the only one that is adequately
> defined is <DOC> which is the root element of the tag structure. Everything
> else falls within it. All the other definitions could be debated from now
> until the cows come home.
>
> 1. Are any of these container tags recognized by today's screen
> readers and other AT? The last time we checked (last spring), they were
> ignored by screen readers and the PDF tags were read top-to-bottom down the
> tag tree regardless of whether there were container tags here and there or
> not. Is this still the case?
>
> 2. Does it matter if the <DOC> tag is there in the PDF tag tree?
>
> 3. From the user's point of view, is there any proposed purpose for
> these container tags, now or in the future?
>
> 4. And what about <SPAN> tags, do they still interfere with screen
> readers and AT?
>
>
>
> NOTE: I know these tags can have some purpose for those who create PDFs,
> but
> I'm questioning their purpose by AT.
>
>
>
> We couldn't find any references to these container tags when we searched
> the
> PDF/UA standards.
>
> We can't find any references to their correct usage in WCAG, either.
>
> And what happened to the search utility on the WAI website?
> http://www.w3.org/WAI/ It's now so difficult to find information there.
>
>
>
> Thanks in advance,
>
> --Bevi Chagnon
>
>
>
>
>
> > > > >
From: Chagnon | PubCom.com
Date: Mon, Sep 28 2015 3:42PM
Subject: Re: PDF Container tags
← Previous message | Next message →
Thanks, Ryan.
This being a PDF we're discussing, ARIA isn't available like it is in HTML.
So are you saying that screen readers can't navigate via the PDF container tags at all?
Just looking for clarification to pass along to my student.
--Bevi
From: Chagnon | PubCom.com
Date: Mon, Sep 28 2015 8:29PM
Subject: Re: PDF Container tags
← Previous message | Next message →
Thanks Ryan. You're confirming my opinion.
And thanks for catching <Document> rather than <DOC>. Sometimes I get my various tagging languages/syntax flipped.
But to clarify my earlier question:
Do these various container tags -- <DOCUMENT>, <PART>, <ART>, <SECT>, and <DIV> -- have any affect on screen readers and other AT?
They are the container tags specified in the PDF 2008 standard tag set. In my experience, we haven't noticed any screen readers acknowledging them in a document, nor stumbling over them either.
Do they cause problems for AT?
Do they provide any benefits for AT users?
--Bevi Chagnon
From: Moore,Michael (Accessibility) (HHSC)
Date: Tue, Sep 29 2015 7:10AM
Subject: Re: PDF Container tags
← Previous message | Next message →
<bevi>
But to clarify my earlier question:
Do these various container tags -- <DOCUMENT>, <PART>, <ART>, <SECT>, and <DIV> -- have any effect on screen readers and other AT?
</bevi>
These tags do not have any impact on any AT that I have tested with (JAWS, Window Eyes, NVDA, ZoomText, Magic, Dragon, or VoiceOver for either OSX or iOS). I have found them useful when remediating document because they allow me to work on logical chunks in the tag tree. If you split a document Acrobat Pro will place each split section into a <PART> which makes it easy to work on a page at a time when you stitch things back together.
Mike Moore
Accessibility Coordinator
Texas Health and Human Services Commission
Civil Rights Office
(512) 438-3431 (Office)
From: Jon Metz
Date: Tue, Sep 29 2015 8:08AM
Subject: Re: PDF Container tags
← Previous message | Next message →
Hi Bevi,
Standard Grouping structure tags are not exposed to Assistive Technology
(AT), unless the user navigates the tag tree themselves. I'm writing
from my phone, so I can't view 32000 at the moment. Unfortunately I just
need to go by memory.
I *believe* these tags are intended for a "Strongly Structured" document,
that is, one that is nested using only H tags and subsequent tags.
Unfortunately (fortunately?) AT trends to have lackluster performance
dealing with Strongly Structured documents, so using Grouping tags becomes
optional, with the exception of the <Document> tag, which is required as
the root tag.
32000 does a rather poor job of explaining when to use them, and how. The
order in which they appear in the specification doesn't even help. I'll do
my best to recall their purpose though.
There can be only one <Document> tag, defined as the root. <Part> is used
to explain a segment of the overall document, such as an excerpt for a
book. There can be multiple <Part>s in a document, but if there's only one
<Part>, it's fairly superfluous. <Sect> tags can be nested within each
other, and typically note separating sections, such as Chapters or Asides.
<Div> tags can also be nested inside each other, and can be useful for
independent groups of content, such as address info. I rarely use these
though, as nesting structure too deeply can actually make it difficult for
screen readers to access the content.
A special kind of Grouping Structure tag, <TOC> contains a list of places
to find content in a document, and can only have other nested <TOC> and
<TOCI> tags. <TOCI> tags can have a <Lbl>, <Reference>, and <Link> tag
associated with them. Note that using <Lbl> outside an <L> tag might
generate a false error in PAC 2.
Now, the following is just my opinion. I prefer to use Grouping tags in my
documents for a few different reasons: First, to improve semantics when
submitting to Fed clients who might have Acrobat Pro who are disabled. In
my opinion, they provide a better idea about the structure of the Document. I
also tend to give Grouping tags a Title, so I can identify their purpose in
the document.
Further, it can be extremely useful when marking up a longer document
(especially PowerPoint conversions) in order to differentiate the
separate components of the file.
Finally, it can be useful when there's significant changes or errors to
only one part of the Document and I need to extract corrupt sections or
replace with new iterations without retagging the entire thing. For
example, there are times when there's a significant error in the document
and indeed to break out sections to find where the troubled page is. I just
break a section at a time until I find the culprit.
In the end, it's really for the remediator and not the end user, although
it can make it easier for other purposes later, such as using a third party
app to map to HTML or even reflow the document, which can sort of be seen
in use via the free Callas PDFgoHTML plug-in.
Hope this helps.
Jon
On Tuesday, September 29, 2015, Moore,Michael (Accessibility) (HHSC) <
= EMAIL ADDRESS REMOVED =
<javascript:_e(%7B%7D,'cvml',' = EMAIL ADDRESS REMOVED = ');>> wrote:
> <bevi>
> But to clarify my earlier question:
> Do these various container tags -- <DOCUMENT>, <PART>, <ART>, <SECT>, and
> <DIV> -- have any effect on screen readers and other AT?
> </bevi>
>
> These tags do not have any impact on any AT that I have tested with (JAWS,
> Window Eyes, NVDA, ZoomText, Magic, Dragon, or VoiceOver for either OSX or
> iOS). I have found them useful when remediating document because they allow
> me to work on logical chunks in the tag tree. If you split a document
> Acrobat Pro will place each split section into a <PART> which makes it easy
> to work on a page at a time when you stitch things back together.
>
> Mike Moore
> Accessibility Coordinator
> Texas Health and Human Services Commission
> Civil Rights Office
> (512) 438-3431 (Office)
>
>
From: Andrew Kirkpatrick
Date: Tue, Sep 29 2015 8:09AM
Subject: Re: PDF Container tags
← Previous message | Next message →
Bevi,
At least with Adobe Reader, Part, Article, Section, Division, and Document tags are exposed for assistive technology to have access to. I do not believe that any AT does anything with this information currently, but that isn't to say that they couldn't.
Thanks,
AWK
Andrew Kirkpatrick
Group Product Manager, Accessibility
Adobe Systems
= EMAIL ADDRESS REMOVED =
http://twitter.com/awkawk
http://blogs.adobe.com/accessibility
On 9/28/15, 22:29, "WebAIM-Forum on behalf of Chagnon | PubCom.com" < = EMAIL ADDRESS REMOVED = on behalf of = EMAIL ADDRESS REMOVED = > wrote:
>Thanks Ryan. You're confirming my opinion.
>And thanks for catching <Document> rather than <DOC>. Sometimes I get my various tagging languages/syntax flipped.
>
>But to clarify my earlier question:
>Do these various container tags -- <DOCUMENT>, <PART>, <ART>, <SECT>, and <DIV> -- have any affect on screen readers and other AT?
>
>They are the container tags specified in the PDF 2008 standard tag set. In my experience, we haven't noticed any screen readers acknowledging them in a document, nor stumbling over them either.
>
>Do they cause problems for AT?
>Do they provide any benefits for AT users?
>
>--Bevi Chagnon
>
>
From: Duff Johnson
Date: Tue, Sep 29 2015 8:41AM
Subject: Re: PDF Container tags
← Previous message | Next message →
> At least with Adobe Reader, Part, Article, Section, Division, and Document tags are exposed for assistive technology to have access to.
This is also my understanding.
> I do not believe that any AT does anything with this information currently
Sadly, also true. In no small part, I believe, because these structure types are underspecified in ISO 32000-1. Accordingly, there's no benefit to supporting them, since authoring software and (most) users don't do anything consistent with them.
> , but that isn't to say that they couldn't.
As a practical matter AT developers won't choose to support <Sect>, <Art>, etc. Since the existing specification of these elements is poor, there's also no consistent usage "in the field", so therefore, no reward for an AT developer's efforts.
In fact, as we see from the comments, knowing that AT just ignores them, people use these grouping elements for convenience in remediation as much (more) than they do for some sort of semantic gain.
PDF 2.0, which is entering the final stages of drafting, loses some Grouping structure elements. <Part> and <Div> are the only grouping elements that remain. They have clear purposes, and are quite distinct. We did add 1 grouping element: <Aside>. All of these are straightforward and should help remediators and software developers alike all get on the same page with grouping concepts.
We also added the <DocumentFragment> concept, which addresses a wide variety of content that quite common in PDF form (but never ever seen in HTML). This is likewise defined in a very usable way in ISO 32000-2, certainly as compared to the grouping element concepts in 32000-1.
In all, I hope (and expect) AT developers (not to mention browser developers) to decide that PDF 2.0 is worth supporting. Nothing will make PDF authoring and remediation software come along faster than if AT vendors choose to adopt PDF 2.0.
PDF/UA-2, also under development, will leverage PDF 2.0.
I hope it doesn't offend too much if I mention that the PDF Technical Conference next month in San Jose is an excellent place for developers to learn all about the guts of PDF accessibility technology, PDF/UA, PDF 2.0, PDF/UA-2 and a lot more. More infor:
http://www.pdfa.org/2015/06/pdf-technical-conference-2015-program/
Duff.
From: Chagnon | PubCom.com
Date: Wed, Sep 30 2015 10:30AM
Subject: Re: PDF Container tags
← Previous message | No next message
Thanks everyone for your comments.
It's clear that:
1. There isn't a clear definition for each of the container tags as to what their intended purposes are.
2. Current specs/definition are ambiguous and confusing.
3. Current AT doesn't use them, but they are "exposed" to AT so there's the possibility that they could have a useful purpose.
We do have a long way to go in this field, don't we?
--Bevi Chagnon
â â â
Bevi Chagnon | www.PubCom.com | = EMAIL ADDRESS REMOVED =
Technologists, Consultants, Trainers, Designers, and Developers
for publishing & communication
| PRINT | WEB | PDF | EPUB | Sec. 508 ACCESSIBILITY |
â â â