WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: L Snider
Date: Feb 18, 2015 12:51PM


Hi Bevi,

One question on this:
1. Run Acrobat's accessibility checker. This looks at only about 20% of the
document's features, so don't depend on it for a full check.

This is the full report and check, right? If so, what else would you check?

Cheers

Lisa

On Wed, Feb 18, 2015 at 1:41 PM, Chagnon | PubCom < <EMAIL REMOVED> >
wrote:

> Lynn, I too have a strong programming background in HTML, as well as SGML,
> XML, and many other markup languages. So tags plus reading order create the
> document's structure in my mind! In theory, I don't believe a PDF can have
> any structure, good or bad, without tags. All PDFs have a page
> architecture,
> but that's not the same thing as structure.
>
> Lynn asked: " if so how would I recognise it if I were to examine the
> document's building blocks "
>
> You have to examine it from several viewpoints in Acrobat Pro. I teach my
> students this method:
> 1. Run Acrobat's accessibility checker. This looks at only about 20% of the
> document's features, so don't depend on it for a full check.
>
> 2. Run down the tag tree, top-to-bottom. I call this the tag reading order.
> For sighted users, they can arrow down from tag to tag and also see on the
> page which item is highlighted for each tag. They'll see very quickly that
> the figures weren't read at the correct place in the tag tree, or that the
> second half of body text was read first, then the heading 1, then the
> remaining body text.
>
> For screen reader users, this is what your software is using. But it's more
> difficult to tell if the document is correct. Were you able to hear and
> figure out what was read? Did it make sense (not the content itself, but
> the
> order in which you heard it)? Screen readers also can't tell sometimes if
> it's tagged correctly. Example: Adobe InDesign has a tragic flaw. When a
> sidebar (boxed text that's secondary to the main story) is exported to PDF,
> the conversion isn't correct. All of the text is jumbled together;
> paragraphs are lost, including any headings, bulleted lists, tables,
> figures, etc. So a screen reader just hears the text run-on blah blah blah,
> but never knows if he's reading one paragraph, multiple paragraphs,
> headings, or any other parts of a document. My screen reader testers often
> miss these problems; they just can't tell if they've missing something or
> if
> it's incorrect.
>
> 3. Run down the "real" reading order. This is the Order panel in Acrobat.
> Often overlooked by many in accessible documentation, this is the original
> reading order that's still used by many assistive technologies, including
> braille printers and keyboards. I've never had any of my screen reader
> testers review this because their software has a hard time voicing it in a
> way that makes sense to them. But they can see this reading order another
> way; View / Zoom / Reflow. This utility rejiggers the visual layout on the
> screen to mimic the real reading order. Columns are removed, everything is
> sequential and linear, top to bottom. So if the first item read by a screen
> reader happens to be the photo caption, not heading 1, then you have a
> reading order problem.
>
> 4. After that, the usual review of tags, tables, alt-text, etc. takes
> place.
>
> --Bevi Chagnon
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn Holdsworth
> Sent: Wednesday, February 18, 2015 1:46 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Hi Bevi,
>
> Thanks for taking the time to write such a comprehensive response.
>
> From creating HTML pages for about half a lifetime, I'd define tags and
> structure pretty much the way you do.
>
> But I inferred from this thread, and from talking with someone who knows a
> lot more about PDF than I do, that it's possible to have structure without
> tags in a PDF document. Is this correct, and if so how would I recognise it
> if I were to examine the document's building blocks?
>
> Best, Lynn
>
> On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > Lynn wrote: " in PDF docs, what's the difference between tags and
> > structure?
> > "
> >
> > This is one of the toughest concepts we teachers have to explain! I'd
> > love to hear how others describe it. Here's my take:
> >
> > Tags are labels. Code labels, specifically, that are read by Assistive
> > Technologies and are not usually visible to sighted users unless they
> > have Acrobat Pro. They let AT users know what's a heading 2, a list of
> > bullets, tables, and other parts of the documents. Tags also do a lot
> > of work for us, such as assisting us in creating bookmarks and tables
> > of contents, creating navigation systems, and holding the Alt-text on
> > graphics (Alt-Text is an attribute on the figure tag and doesn't stand
> > alone on its own).
> >
> > Structure is the sequence of how the document's pieces will be read,
> > or in other words, the sequence in which the tagged items are read.
> > Call it reading order or tag reading order. The structure of some
> > documents can also have nesting qualities, such as all the pieces of a
> > chapter, and all the chapters in a book.
> >
> > An example: If Heading 1 designates a chapter title, then all the
> > paragraph, bullets, tables, and heading 2 items within that chapter
> > will be nested inside the main heading 1 tag. This allows AT software
> > to figure out, hopefully, what goes with what; that all the tags
> > nested within Heading 1 is a chapter.
> >
> > Structure is created when you have tags (the right tag labels) and a
> > reading order (a logical reading order). It is possible that a tagged
> > and structured document might not be fully accessible because the tags
> > aren't accurate enough or the reading order is out of whack.
> >
> > Example number 1: In older versions of MS Word, figures would be
> > placed in very odd places of the reading order when it was exported to
> > a PDF. If paragraph 1 stated "see figure 5", figure 5 itself might end
> > up at the very end of the reading order, not near paragraph 1 where it
> > was referenced. A sighted person sees figure 5 next to the paragraph,
> > but a screen reader user doesn't hear it voiced until the last page,
> > and maybe that's page 360 of a long government document. So the
> > document is tagged and structured, but it's a faulty structure because
> > the reading order is incorrect.
> >
> > Example number 2: Graphic designers who use desktop publishing
> > programs like Adobe InDesign and QuarkXpress create very complex
> > visual layouts.
> > Visually,
> > things aren't designed in a traditional top down left right pattern
> > but instead could be scattered all over the physical page. Here's an
> > example of a 2-page magazine spread:
> > http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_
> > 1_by_B reakTheRecords.jpg (This is just a random sample I pulled up
> > on the Internet, so it is only a graphic of a 2-page spread, no live
> > text or
> > Alt-text.)
> >
> > Note that article title (or heading 1) appears on page 2, and the body
> > text of the story starts on page 1. Backwards! And then there are 2
> > quotes at the top of page 1, so obviously the designer wants us to
> > read those at the beginning of the story, also. And here's a similar
> > example:
> > https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949
> > 607393
> > d8b1f24ad96.jpg
> >
> > Whew! Getting a tagged, logical reading order from this type of
> > publication isn't easy!
> >
> > Summary:
> > Structure equals tagged content placed in a logical reading order.
> >
> > Well, that's my attempt. Would love to hear how others describe the
> > concepts.
> >
> > --Bevi Chagnon
> >
> > -----Original Message-----
> > From: <EMAIL REMOVED>
> > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > Holdsworth
> > Sent: Wednesday, February 18, 2015 12:11 PM
> > To: WebAIM Discussion List
> > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >
> > Thanks so much everyone for weighing in - I've found this a very
> > useful thread indeed.
> >
> > One more question: in PDF docs, what's the difference between tags and
> > structure? Ryan mentioned that the doc may include structure but not
> > be tagged, and I don't understand the difference.
> >
> > And thanks Duff for the LinkedIn group suggestions. I'll join at least
> > the first one.
> >
> > Really hoping that Adobe is working on ironing out the accessibility
> > glitches in the DownLoad Assistant, as I'd appreciate the chance to
> > learn about and use what seems like a great bunch of accessibility
> > features in Acrobat.
> >
> > Best, Lynn
> >
> > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> >> Bim,
> >> I was talking about both Acrobat and Reader in my reply, sorry if
> >> that wasn't clear. It is the same process for both.
> >> AWK
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim Egan
> >> Sent: Wednesday, February 18, 2015 7:13 AM
> >> To: 'WebAIM Discussion List'
> >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >>
> >> Lynn didn't seem to be talking about using Acrobat though. She
> >> described the experience of many screen reader users in finding a
> >> table in an untagged
> >> PDF when opened in Reader, and she asked why this could happen. Her
> >> message said that the Acrobat installation wasn't accessible.
> >>
> >> Bim
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> >> Kirkpatrick
> >> Sent: 18 February 2015 14:36
> >> To: WebAIM Discussion List
> >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >>
> >> Jon is correct. When Acrobat opens an untagged document and there is
> >> a client that is using the accessibility API data running, Acrobat
> >> (or
> >> Reader) will add tags to the document. The result is the same as if
> >> an author used the "add tags" feature in Acrobat. You get Acrobat's
> >> best interpretation of what the tags should be. That will sometimes
> >> result in headings, well-formed tables, lists, and other structures.
> >> Authors who use this feature in Acrobat know that you generally need
> >> to
> > fix some of the tags.
> >>
> >>
> >>
> >> The result is that the document is tagged temporarily and assistive
> >> technologies recognize and use the information.
> >>
> >>
> >>
> >> The dialogs that you see when opening PDF documents give you some
> >> information about what is going on. To understand better, here's my
> >> explanation.
> >>
> >>
> >>
> >> In acrobat or Reader preferences there is a "Reading" category.
> >> There is a checkbox that is labeled "Confirm before tagging
> >> documents". If this is checked, then every time that Reader intends
> >> to tag an untagged document the "Reading an untagged document with
> >> assistive technology" dialog pops up and the user needs to confirm
> >> that this is what they'd like to do. If the user selects cancel then
> >> the document won't be tagged and the reading experience will be
> >> essentially
> > non-existent.
> >>
> >>
> >>
> >> If you elect to allow the tagging, there are other options as
> >> mentioned in one of the replies. I recommend using the "infer reading
> >> order from document" option.
> >>
> >>
> >>
> >> There are other settings related to large documents and auto-tagging.
> >> Autotagging takes time, so if you open a very dense 600 page manual
> >> you may find that Reader takes a long time to do the tagging. It
> >> can, and we are always looking to improve the efficiency of this
> process.
> >> The option for the user is to indicate whether the autotagging should
> >> occur only on visible pages, on all pages in the document, or on all
> >> pages except if the document is "large". The user gets to define
> >> what large means - a user might find that their system is slow at
> >> this so sets the limit at 25 pages, or might set it higher if their
> >> system handles this process quickly. The down side of only tagging a
> >> few pages at a time is that if there are recognized structures on
> >> pages that haven't been tagged yet (e.g. a heading on page 51) the
> >> user can't use screen reader heading navigation to jump to it because
> >> the tags
> > don't exist until the page is in view in the reader.
> >>
> >>
> >>
> >> Hope this helps,
> >>
> >> AWK
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: <EMAIL REMOVED>
> >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> >> Holdsworth
> >> Sent: Wednesday, February 18, 2015 4:36 AM
> >> To: WebAIM Discussion List
> >> Subject: [WebAIM] Untagged PDF doc with table structure
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> Apologies if PDF accessibility is off topic. If so is there a list
> >> that covers this?
> >>
> >>
> >>
> >> But if not ...
> >>
> >>
> >>
> >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> >>
> >>
> >>
> >> So I begin to peruse it using JAWS, and come across a table whose
> >> structure is robust enough for me to move around it using the JAWS
> >> table
> > keystrokes.
> >>
> >>
> >>
> >> Does this mean there *are* tags in the document after all? Or has
> >> Adobe Reader used heuristics to add tags to improve the doc's
> >> accessibility, since my settings flag up that I'm using a screenreader?
> >>
> >>
> >>
> >> I tried to download a trial version of Acrobat Pro so as to examine
> >> the document structure, but the download assistant seems inaccessible.
> >>
> >>
> >>
> >> Thanks as always, Lynn
> >>
> >> > >>
> >> > >> list messages to
> >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> >> > >> > >> list messages to <EMAIL REMOVED>
> >>
> >> > >> > >> list messages to <EMAIL REMOVED>
> >> > >> > >> list messages to <EMAIL REMOVED>
> >>
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > messages to <EMAIL REMOVED>
>
> > > >