WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Andrew Kirkpatrick
Date: Feb 18, 2015 12:36PM


Tagged PDF just means that it has tags at all. There is no guarantee that they are correct. This is where PDF/UA helps in that it answers the important question "tagged how?"
AWK

-----Original Message-----
From: <EMAIL REMOVED> [mailto: <EMAIL REMOVED> ] On Behalf Of Brian Richwine
Sent: Wednesday, February 18, 2015 11:20 AM
To: WebAIM Discussion List
Subject: Re: [WebAIM] Untagged PDF doc with table structure

Hi,

I've heard accessibility professionals state that a "tagged pdf" is one that uses a standard set of tags. So, some PDFs can have structure, but be using a non-standard set of tags and thus assistive technologies will not know how to interpret the tags that are in the document. So I came away thinking that a PDF that has tags is a structured PDF, and even better (in terms of accessibility) a PDF that uses standardized tags has structure and is "tagged".

-Brian

On Wed, Feb 18, 2015 at 1:45 PM, Lynn Holdsworth < <EMAIL REMOVED> >
wrote:

> Hi Bevi,
>
> Thanks for taking the time to write such a comprehensive response.
>
> From creating HTML pages for about half a lifetime, I'd define tags
> and structure pretty much the way you do.
>
> But I inferred from this thread, and from talking with someone who
> knows a lot more about PDF than I do, that it's possible to have
> structure without tags in a PDF document. Is this correct, and if so
> how would I recognise it if I were to examine the document's building
> blocks?
>
> Best, Lynn
>
> On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > Lynn wrote: " in PDF docs, what's the difference between tags and
> > structure?
> > "
> >
> > This is one of the toughest concepts we teachers have to explain!
> > I'd
> love
> > to hear how others describe it. Here's my take:
> >
> > Tags are labels. Code labels, specifically, that are read by
> > Assistive Technologies and are not usually visible to sighted users
> > unless they
> have
> > Acrobat Pro. They let AT users know what's a heading 2, a list of
> bullets,
> > tables, and other parts of the documents. Tags also do a lot of work
> > for us, such as assisting us in creating bookmarks and tables of
> > contents,
> creating
> > navigation systems, and holding the Alt-text on graphics (Alt-Text
> > is an attribute on the figure tag and doesn't stand alone on its own).
> >
> > Structure is the sequence of how the document's pieces will be read,
> > or
> in
> > other words, the sequence in which the tagged items are read. Call
> > it reading order or tag reading order. The structure of some
> > documents can also have nesting qualities, such as all the pieces of
> > a chapter, and all the chapters in a book.
> >
> > An example: If Heading 1 designates a chapter title, then all the
> > paragraph, bullets, tables, and heading 2 items within that chapter
> > will be nested inside the main heading 1 tag. This allows AT
> > software to figure out, hopefully, what goes with what; that all the
> > tags nested within Heading 1 is a chapter.
> >
> > Structure is created when you have tags (the right tag labels) and a
> > reading order (a logical reading order). It is possible that a
> > tagged and structured document might not be fully accessible because
> > the tags aren't accurate enough or the reading order is out of
> > whack.
> >
> > Example number 1: In older versions of MS Word, figures would be
> > placed
> in
> > very odd places of the reading order when it was exported to a PDF.
> > If paragraph 1 stated "see figure 5", figure 5 itself might end up
> > at the
> very
> > end of the reading order, not near paragraph 1 where it was
> > referenced. A sighted person sees figure 5 next to the paragraph,
> > but a screen reader user doesn't hear it voiced until the last page,
> > and maybe that's page 360 of
> a
> > long government document. So the document is tagged and structured,
> > but it's a faulty structure because the reading order is incorrect.
> >
> > Example number 2: Graphic designers who use desktop publishing
> > programs like Adobe InDesign and QuarkXpress create very complex
> > visual layouts.
> > Visually,
> > things aren't designed in a traditional top down left right pattern
> > but instead could be scattered all over the physical page. Here's an
> > example
> of
> > a 2-page magazine spread:
> >
> http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_
> 1_by_B
> > reakTheRecords.jpg (This is just a random sample I pulled up on the
> > Internet, so it is only a graphic of a 2-page spread, no live text
> > or
> > Alt-text.)
> >
> > Note that article title (or heading 1) appears on page 2, and the
> > body
> text
> > of the story starts on page 1. Backwards! And then there are 2
> > quotes at the top of page 1, so obviously the designer wants us to
> > read those at the beginning of the story, also. And here's a similar
> > example:
> >
> https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949
> 607393
> > d8b1f24ad96.jpg
> >
> > Whew! Getting a tagged, logical reading order from this type of
> > publication isn't easy!
> >
> > Summary:
> > Structure equals tagged content placed in a logical reading order.
> >
> > Well, that's my attempt. Would love to hear how others describe the
> > concepts.
> >
> > --Bevi Chagnon
> >
> > -----Original Message-----
> > From: <EMAIL REMOVED>
> > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> Holdsworth
> > Sent: Wednesday, February 18, 2015 12:11 PM
> > To: WebAIM Discussion List
> > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >
> > Thanks so much everyone for weighing in - I've found this a very
> > useful thread indeed.
> >
> > One more question: in PDF docs, what's the difference between tags
> > and structure? Ryan mentioned that the doc may include structure but
> > not be tagged, and I don't understand the difference.
> >
> > And thanks Duff for the LinkedIn group suggestions. I'll join at
> > least
> the
> > first one.
> >
> > Really hoping that Adobe is working on ironing out the accessibility
> > glitches in the DownLoad Assistant, as I'd appreciate the chance to
> > learn about and use what seems like a great bunch of accessibility
> > features in Acrobat.
> >
> > Best, Lynn
> >
> > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> >> Bim,
> >> I was talking about both Acrobat and Reader in my reply, sorry if
> >> that wasn't clear. It is the same process for both.
> >> AWK
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim Egan
> >> Sent: Wednesday, February 18, 2015 7:13 AM
> >> To: 'WebAIM Discussion List'
> >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >>
> >> Lynn didn't seem to be talking about using Acrobat though. She
> >> described the experience of many screen reader users in finding a
> >> table in an untagged
> >> PDF when opened in Reader, and she asked why this could happen. Her
> >> message said that the Acrobat installation wasn't accessible.
> >>
> >> Bim
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> >> Kirkpatrick
> >> Sent: 18 February 2015 14:36
> >> To: WebAIM Discussion List
> >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >>
> >> Jon is correct. When Acrobat opens an untagged document and there
> >> is a client that is using the accessibility API data running,
> >> Acrobat (or
> >> Reader) will add tags to the document. The result is the same as
> >> if an author used the "add tags" feature in Acrobat. You get
> >> Acrobat's best interpretation of what the tags should be. That
> >> will sometimes result in headings, well-formed tables, lists, and other structures.
> >> Authors who use this feature in Acrobat know that you generally
> >> need to
> > fix some of the tags.
> >>
> >>
> >>
> >> The result is that the document is tagged temporarily and assistive
> >> technologies recognize and use the information.
> >>
> >>
> >>
> >> The dialogs that you see when opening PDF documents give you some
> >> information about what is going on. To understand better, here's
> >> my explanation.
> >>
> >>
> >>
> >> In acrobat or Reader preferences there is a "Reading" category.
> >> There is a checkbox that is labeled "Confirm before tagging
> >> documents". If this is checked, then every time that Reader
> >> intends to tag an untagged document the "Reading an untagged
> >> document with assistive technology" dialog pops up and the user
> >> needs to confirm that this is what they'd like to do. If the user
> >> selects cancel then the document won't be tagged and the reading
> >> experience will be essentially
> > non-existent.
> >>
> >>
> >>
> >> If you elect to allow the tagging, there are other options as
> >> mentioned in one of the replies. I recommend using the "infer
> >> reading order from document" option.
> >>
> >>
> >>
> >> There are other settings related to large documents and auto-tagging.
> >> Autotagging takes time, so if you open a very dense 600 page manual
> >> you may find that Reader takes a long time to do the tagging. It
> >> can, and we are always looking to improve the efficiency of this process.
> >> The option for the user is to indicate whether the autotagging
> >> should occur only on visible pages, on all pages in the document,
> >> or on all pages except if the document is "large". The user gets
> >> to define what large means - a user might find that their system is
> >> slow at this so sets the limit at 25 pages, or might set it higher
> >> if their system handles this process quickly. The down side of only
> >> tagging a few pages at a time is that if there are recognized
> >> structures on pages that haven't been tagged yet (e.g. a heading on
> >> page 51) the user can't use screen reader heading navigation to
> >> jump to it because the
> tags
> > don't exist until the page is in view in the reader.
> >>
> >>
> >>
> >> Hope this helps,
> >>
> >> AWK
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> >> Holdsworth
> >> Sent: Wednesday, February 18, 2015 4:36 AM
> >> To: WebAIM Discussion List
> >> Subject: [WebAIM] Untagged PDF doc with table structure
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> Apologies if PDF accessibility is off topic. If so is there a list
> >> that covers this?
> >>
> >>
> >>
> >> But if not ...
> >>
> >>
> >>
> >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> >>
> >>
> >>
> >> So I begin to peruse it using JAWS, and come across a table whose
> >> structure is robust enough for me to move around it using the JAWS
> >> table
> > keystrokes.
> >>
> >>
> >>
> >> Does this mean there *are* tags in the document after all? Or has
> >> Adobe Reader used heuristics to add tags to improve the doc's
> >> accessibility, since my settings flag up that I'm using a screenreader?
> >>
> >>
> >>
> >> I tried to download a trial version of Acrobat Pro so as to examine
> >> the document structure, but the download assistant seems inaccessible.
> >>
> >>
> >>
> >> Thanks as always, Lynn
> >>
> >> > >>
> >> > >> list messages to
> >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> >> > >> > >> list messages to <EMAIL REMOVED>
> >>
> >> > >> > >> list messages to <EMAIL REMOVED>
> >> > >> > >> list messages to <EMAIL REMOVED>
> >>
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > list messages to <EMAIL REMOVED>
>