WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Lynn Holdsworth
Date: Feb 18, 2015 11:45AM


Hi Bevi,

Thanks for taking the time to write such a comprehensive response.

From creating HTML pages for about half a lifetime, I'd define tags
and structure pretty much the way you do.

But I inferred from this thread, and from talking with someone who
knows a lot more about PDF than I do, that it's possible to have
structure without tags in a PDF document. Is this correct, and if so
how would I recognise it if I were to examine the document's building
blocks?

Best, Lynn

On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> Lynn wrote: " in PDF docs, what's the difference between tags and
> structure?
> "
>
> This is one of the toughest concepts we teachers have to explain! I'd love
> to hear how others describe it. Here's my take:
>
> Tags are labels. Code labels, specifically, that are read by Assistive
> Technologies and are not usually visible to sighted users unless they have
> Acrobat Pro. They let AT users know what's a heading 2, a list of bullets,
> tables, and other parts of the documents. Tags also do a lot of work for
> us,
> such as assisting us in creating bookmarks and tables of contents, creating
> navigation systems, and holding the Alt-text on graphics (Alt-Text is an
> attribute on the figure tag and doesn't stand alone on its own).
>
> Structure is the sequence of how the document's pieces will be read, or in
> other words, the sequence in which the tagged items are read. Call it
> reading order or tag reading order. The structure of some documents can
> also
> have nesting qualities, such as all the pieces of a chapter, and all the
> chapters in a book.
>
> An example: If Heading 1 designates a chapter title, then all the
> paragraph,
> bullets, tables, and heading 2 items within that chapter will be nested
> inside the main heading 1 tag. This allows AT software to figure out,
> hopefully, what goes with what; that all the tags nested within Heading 1
> is
> a chapter.
>
> Structure is created when you have tags (the right tag labels) and a
> reading
> order (a logical reading order). It is possible that a tagged and
> structured
> document might not be fully accessible because the tags aren't accurate
> enough or the reading order is out of whack.
>
> Example number 1: In older versions of MS Word, figures would be placed in
> very odd places of the reading order when it was exported to a PDF. If
> paragraph 1 stated "see figure 5", figure 5 itself might end up at the very
> end of the reading order, not near paragraph 1 where it was referenced. A
> sighted person sees figure 5 next to the paragraph, but a screen reader
> user
> doesn't hear it voiced until the last page, and maybe that's page 360 of a
> long government document. So the document is tagged and structured, but
> it's
> a faulty structure because the reading order is incorrect.
>
> Example number 2: Graphic designers who use desktop publishing programs
> like
> Adobe InDesign and QuarkXpress create very complex visual layouts.
> Visually,
> things aren't designed in a traditional top down left right pattern but
> instead could be scattered all over the physical page. Here's an example of
> a 2-page magazine spread:
> http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_1_by_B
> reakTheRecords.jpg (This is just a random sample I pulled up on the
> Internet, so it is only a graphic of a 2-page spread, no live text or
> Alt-text.)
>
> Note that article title (or heading 1) appears on page 2, and the body text
> of the story starts on page 1. Backwards! And then there are 2 quotes at
> the
> top of page 1, so obviously the designer wants us to read those at the
> beginning of the story, also. And here's a similar example:
> https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949607393
> d8b1f24ad96.jpg
>
> Whew! Getting a tagged, logical reading order from this type of
> publication
> isn't easy!
>
> Summary:
> Structure equals tagged content placed in a logical reading order.
>
> Well, that's my attempt. Would love to hear how others describe the
> concepts.
>
> --Bevi Chagnon
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn Holdsworth
> Sent: Wednesday, February 18, 2015 12:11 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Thanks so much everyone for weighing in - I've found this a very useful
> thread indeed.
>
> One more question: in PDF docs, what's the difference between tags and
> structure? Ryan mentioned that the doc may include structure but not be
> tagged, and I don't understand the difference.
>
> And thanks Duff for the LinkedIn group suggestions. I'll join at least the
> first one.
>
> Really hoping that Adobe is working on ironing out the accessibility
> glitches in the DownLoad Assistant, as I'd appreciate the chance to learn
> about and use what seems like a great bunch of accessibility features in
> Acrobat.
>
> Best, Lynn
>
> On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
>> Bim,
>> I was talking about both Acrobat and Reader in my reply, sorry if that
>> wasn't clear. It is the same process for both.
>> AWK
>>
>> -----Original Message-----
>> From: <EMAIL REMOVED>
>> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim Egan
>> Sent: Wednesday, February 18, 2015 7:13 AM
>> To: 'WebAIM Discussion List'
>> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>>
>> Lynn didn't seem to be talking about using Acrobat though. She
>> described the experience of many screen reader users in finding a
>> table in an untagged
>> PDF when opened in Reader, and she asked why this could happen. Her
>> message said that the Acrobat installation wasn't accessible.
>>
>> Bim
>>
>> -----Original Message-----
>> From: <EMAIL REMOVED>
>> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
>> Kirkpatrick
>> Sent: 18 February 2015 14:36
>> To: WebAIM Discussion List
>> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>>
>> Jon is correct. When Acrobat opens an untagged document and there is
>> a client that is using the accessibility API data running, Acrobat (or
>> Reader) will add tags to the document. The result is the same as if
>> an author used the "add tags" feature in Acrobat. You get Acrobat's
>> best interpretation of what the tags should be. That will sometimes
>> result in headings, well-formed tables, lists, and other structures.
>> Authors who use this feature in Acrobat know that you generally need to
> fix some of the tags.
>>
>>
>>
>> The result is that the document is tagged temporarily and assistive
>> technologies recognize and use the information.
>>
>>
>>
>> The dialogs that you see when opening PDF documents give you some
>> information about what is going on. To understand better, here's my
>> explanation.
>>
>>
>>
>> In acrobat or Reader preferences there is a "Reading" category. There
>> is a checkbox that is labeled "Confirm before tagging documents". If
>> this is checked, then every time that Reader intends to tag an
>> untagged document the "Reading an untagged document with assistive
>> technology" dialog pops up and the user needs to confirm that this is
>> what they'd like to do. If the user selects cancel then the document
>> won't be tagged and the reading experience will be essentially
> non-existent.
>>
>>
>>
>> If you elect to allow the tagging, there are other options as
>> mentioned in one of the replies. I recommend using the "infer reading
>> order from document" option.
>>
>>
>>
>> There are other settings related to large documents and auto-tagging.
>> Autotagging takes time, so if you open a very dense 600 page manual
>> you may find that Reader takes a long time to do the tagging. It can,
>> and we are always looking to improve the efficiency of this process.
>> The option for the user is to indicate whether the autotagging should
>> occur only on visible pages, on all pages in the document, or on all
>> pages except if the document is "large". The user gets to define what
>> large means - a user might find that their system is slow at this so
>> sets the limit at 25 pages, or might set it higher if their system
>> handles this process quickly. The down side of only tagging a few
>> pages at a time is that if there are recognized structures on pages
>> that haven't been tagged yet (e.g. a heading on page 51) the user
>> can't use screen reader heading navigation to jump to it because the tags
> don't exist until the page is in view in the reader.
>>
>>
>>
>> Hope this helps,
>>
>> AWK
>>
>>
>>
>> -----Original Message-----
>> From: <EMAIL REMOVED>
>> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
>> Holdsworth
>> Sent: Wednesday, February 18, 2015 4:36 AM
>> To: WebAIM Discussion List
>> Subject: [WebAIM] Untagged PDF doc with table structure
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Apologies if PDF accessibility is off topic. If so is there a list
>> that covers this?
>>
>>
>>
>> But if not ...
>>
>>
>>
>> I open a PDF document, and Adobe Reader alerts me that it's untagged.
>>
>>
>>
>> So I begin to peruse it using JAWS, and come across a table whose
>> structure is robust enough for me to move around it using the JAWS table
> keystrokes.
>>
>>
>>
>> Does this mean there *are* tags in the document after all? Or has
>> Adobe Reader used heuristics to add tags to improve the doc's
>> accessibility, since my settings flag up that I'm using a screenreader?
>>
>>
>>
>> I tried to download a trial version of Acrobat Pro so as to examine
>> the document structure, but the download assistant seems inaccessible.
>>
>>
>>
>> Thanks as always, Lynn
>>
>> >>
>> >> list messages to
>> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
>> >> >> list messages to <EMAIL REMOVED>
>>
>> >> >> list messages to <EMAIL REMOVED>
>> >> >> list messages to <EMAIL REMOVED>
>>
> > > messages to <EMAIL REMOVED>
>
> > > >