WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Brian Richwine
Date: Feb 18, 2015 12:19PM


Hi,

I've heard accessibility professionals state that a "tagged pdf" is one
that uses a standard set of tags. So, some PDFs can have structure, but be
using a non-standard set of tags and thus assistive technologies will not
know how to interpret the tags that are in the document. So I came away
thinking that a PDF that has tags is a structured PDF, and even better (in
terms of accessibility) a PDF that uses standardized tags has structure and
is "tagged".

-Brian

On Wed, Feb 18, 2015 at 1:45 PM, Lynn Holdsworth < <EMAIL REMOVED> >
wrote:

> Hi Bevi,
>
> Thanks for taking the time to write such a comprehensive response.
>
> From creating HTML pages for about half a lifetime, I'd define tags
> and structure pretty much the way you do.
>
> But I inferred from this thread, and from talking with someone who
> knows a lot more about PDF than I do, that it's possible to have
> structure without tags in a PDF document. Is this correct, and if so
> how would I recognise it if I were to examine the document's building
> blocks?
>
> Best, Lynn
>
> On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > Lynn wrote: " in PDF docs, what's the difference between tags and
> > structure?
> > "
> >
> > This is one of the toughest concepts we teachers have to explain! I'd
> love
> > to hear how others describe it. Here's my take:
> >
> > Tags are labels. Code labels, specifically, that are read by Assistive
> > Technologies and are not usually visible to sighted users unless they
> have
> > Acrobat Pro. They let AT users know what's a heading 2, a list of
> bullets,
> > tables, and other parts of the documents. Tags also do a lot of work for
> > us,
> > such as assisting us in creating bookmarks and tables of contents,
> creating
> > navigation systems, and holding the Alt-text on graphics (Alt-Text is an
> > attribute on the figure tag and doesn't stand alone on its own).
> >
> > Structure is the sequence of how the document's pieces will be read, or
> in
> > other words, the sequence in which the tagged items are read. Call it
> > reading order or tag reading order. The structure of some documents can
> > also
> > have nesting qualities, such as all the pieces of a chapter, and all the
> > chapters in a book.
> >
> > An example: If Heading 1 designates a chapter title, then all the
> > paragraph,
> > bullets, tables, and heading 2 items within that chapter will be nested
> > inside the main heading 1 tag. This allows AT software to figure out,
> > hopefully, what goes with what; that all the tags nested within Heading 1
> > is
> > a chapter.
> >
> > Structure is created when you have tags (the right tag labels) and a
> > reading
> > order (a logical reading order). It is possible that a tagged and
> > structured
> > document might not be fully accessible because the tags aren't accurate
> > enough or the reading order is out of whack.
> >
> > Example number 1: In older versions of MS Word, figures would be placed
> in
> > very odd places of the reading order when it was exported to a PDF. If
> > paragraph 1 stated "see figure 5", figure 5 itself might end up at the
> very
> > end of the reading order, not near paragraph 1 where it was referenced. A
> > sighted person sees figure 5 next to the paragraph, but a screen reader
> > user
> > doesn't hear it voiced until the last page, and maybe that's page 360 of
> a
> > long government document. So the document is tagged and structured, but
> > it's
> > a faulty structure because the reading order is incorrect.
> >
> > Example number 2: Graphic designers who use desktop publishing programs
> > like
> > Adobe InDesign and QuarkXpress create very complex visual layouts.
> > Visually,
> > things aren't designed in a traditional top down left right pattern but
> > instead could be scattered all over the physical page. Here's an example
> of
> > a 2-page magazine spread:
> >
> http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_1_by_B
> > reakTheRecords.jpg (This is just a random sample I pulled up on the
> > Internet, so it is only a graphic of a 2-page spread, no live text or
> > Alt-text.)
> >
> > Note that article title (or heading 1) appears on page 2, and the body
> text
> > of the story starts on page 1. Backwards! And then there are 2 quotes at
> > the
> > top of page 1, so obviously the designer wants us to read those at the
> > beginning of the story, also. And here's a similar example:
> >
> https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949607393
> > d8b1f24ad96.jpg
> >
> > Whew! Getting a tagged, logical reading order from this type of
> > publication
> > isn't easy!
> >
> > Summary:
> > Structure equals tagged content placed in a logical reading order.
> >
> > Well, that's my attempt. Would love to hear how others describe the
> > concepts.
> >
> > --Bevi Chagnon
> >
> > -----Original Message-----
> > From: <EMAIL REMOVED>
> > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> Holdsworth
> > Sent: Wednesday, February 18, 2015 12:11 PM
> > To: WebAIM Discussion List
> > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >
> > Thanks so much everyone for weighing in - I've found this a very useful
> > thread indeed.
> >
> > One more question: in PDF docs, what's the difference between tags and
> > structure? Ryan mentioned that the doc may include structure but not be
> > tagged, and I don't understand the difference.
> >
> > And thanks Duff for the LinkedIn group suggestions. I'll join at least
> the
> > first one.
> >
> > Really hoping that Adobe is working on ironing out the accessibility
> > glitches in the DownLoad Assistant, as I'd appreciate the chance to learn
> > about and use what seems like a great bunch of accessibility features in
> > Acrobat.
> >
> > Best, Lynn
> >
> > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> >> Bim,
> >> I was talking about both Acrobat and Reader in my reply, sorry if that
> >> wasn't clear. It is the same process for both.
> >> AWK
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim Egan
> >> Sent: Wednesday, February 18, 2015 7:13 AM
> >> To: 'WebAIM Discussion List'
> >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >>
> >> Lynn didn't seem to be talking about using Acrobat though. She
> >> described the experience of many screen reader users in finding a
> >> table in an untagged
> >> PDF when opened in Reader, and she asked why this could happen. Her
> >> message said that the Acrobat installation wasn't accessible.
> >>
> >> Bim
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> >> Kirkpatrick
> >> Sent: 18 February 2015 14:36
> >> To: WebAIM Discussion List
> >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >>
> >> Jon is correct. When Acrobat opens an untagged document and there is
> >> a client that is using the accessibility API data running, Acrobat (or
> >> Reader) will add tags to the document. The result is the same as if
> >> an author used the "add tags" feature in Acrobat. You get Acrobat's
> >> best interpretation of what the tags should be. That will sometimes
> >> result in headings, well-formed tables, lists, and other structures.
> >> Authors who use this feature in Acrobat know that you generally need to
> > fix some of the tags.
> >>
> >>
> >>
> >> The result is that the document is tagged temporarily and assistive
> >> technologies recognize and use the information.
> >>
> >>
> >>
> >> The dialogs that you see when opening PDF documents give you some
> >> information about what is going on. To understand better, here's my
> >> explanation.
> >>
> >>
> >>
> >> In acrobat or Reader preferences there is a "Reading" category. There
> >> is a checkbox that is labeled "Confirm before tagging documents". If
> >> this is checked, then every time that Reader intends to tag an
> >> untagged document the "Reading an untagged document with assistive
> >> technology" dialog pops up and the user needs to confirm that this is
> >> what they'd like to do. If the user selects cancel then the document
> >> won't be tagged and the reading experience will be essentially
> > non-existent.
> >>
> >>
> >>
> >> If you elect to allow the tagging, there are other options as
> >> mentioned in one of the replies. I recommend using the "infer reading
> >> order from document" option.
> >>
> >>
> >>
> >> There are other settings related to large documents and auto-tagging.
> >> Autotagging takes time, so if you open a very dense 600 page manual
> >> you may find that Reader takes a long time to do the tagging. It can,
> >> and we are always looking to improve the efficiency of this process.
> >> The option for the user is to indicate whether the autotagging should
> >> occur only on visible pages, on all pages in the document, or on all
> >> pages except if the document is "large". The user gets to define what
> >> large means - a user might find that their system is slow at this so
> >> sets the limit at 25 pages, or might set it higher if their system
> >> handles this process quickly. The down side of only tagging a few
> >> pages at a time is that if there are recognized structures on pages
> >> that haven't been tagged yet (e.g. a heading on page 51) the user
> >> can't use screen reader heading navigation to jump to it because the
> tags
> > don't exist until the page is in view in the reader.
> >>
> >>
> >>
> >> Hope this helps,
> >>
> >> AWK
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> >> Holdsworth
> >> Sent: Wednesday, February 18, 2015 4:36 AM
> >> To: WebAIM Discussion List
> >> Subject: [WebAIM] Untagged PDF doc with table structure
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> Apologies if PDF accessibility is off topic. If so is there a list
> >> that covers this?
> >>
> >>
> >>
> >> But if not ...
> >>
> >>
> >>
> >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> >>
> >>
> >>
> >> So I begin to peruse it using JAWS, and come across a table whose
> >> structure is robust enough for me to move around it using the JAWS table
> > keystrokes.
> >>
> >>
> >>
> >> Does this mean there *are* tags in the document after all? Or has
> >> Adobe Reader used heuristics to add tags to improve the doc's
> >> accessibility, since my settings flag up that I'm using a screenreader?
> >>
> >>
> >>
> >> I tried to download a trial version of Acrobat Pro so as to examine
> >> the document structure, but the download assistant seems inaccessible.
> >>
> >>
> >>
> >> Thanks as always, Lynn
> >>
> >> > >>
> >> > >> list messages to
> >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> >> > >> > >> list messages to <EMAIL REMOVED>
> >>
> >> > >> > >> list messages to <EMAIL REMOVED>
> >> > >> > >> list messages to <EMAIL REMOVED>
> >>
> > > > > > messages to <EMAIL REMOVED>
> >
> > > > > > > >
> > > >