WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Chagnon | PubCom
Date: Feb 18, 2015 3:48PM


Lisa,
Here's an excellent example of a flawed tag tree reading order, which then
creates an out-of-whack structure.
Surprisingly, it's from the US Access Board itself:
http://www.regulations.gov/#!documentDetail;D=ATBCB-2015-0002-0001 (view the
Content section and look for the PDF there).
This is the text of the new ICT draft for Sec. 508. You'll notice in the tag
tree that the figures are all stacked at the top of the tag tree...yet they
appear in the back portion of the draft on pages 186-192.
This error creates the following reading order:
1. The agency's seal/logo on page 1.
2. 9 illustrations on pages 186 through 192.
3. The title of the document (tagged with a P tag) on page 1.
4. The remaining pages of the document.
This error is because they used an older version of MS Word, which does this
to all graphics...stacks them at the top of the tag tree, or at the end of
the tag tree, or anywhere it feels like it throughout the entire
document...regardless of how someone anchors the graphics in the Word
document itself. Word 2013, on the other hand, doesn't make this error and
places the graphics correctly in the PDF tag tree.
It also doesn't help that they used Acrobat 10 to create the PDF from Word.
--Bevi

-----Original Message-----
From: <EMAIL REMOVED>
[mailto: <EMAIL REMOVED> ] On Behalf Of L Snider
Sent: Wednesday, February 18, 2015 3:18 PM
To: WebAIM Discussion List
Subject: Re: [WebAIM] Untagged PDF doc with table structure

Ah okay, I see where you were going now-thanks. Yes, it is like WCAG...You
can do it and it can still be inaccessible :)

Even after all these years, most of it is still manual. Funny how things
have changed and some things are still the same.

I am loving XI Pro, because you only do the full report by default. None of
the messiness of previous versions.

Cheers

Lisa

On Wed, Feb 18, 2015 at 2:08 PM, Chagnon | PubCom < <EMAIL REMOVED> >
wrote:

> Yes, always run the full report in Acrobat checker and don't waste
> your time with the other options.
>
> The Acrobat checker tells you if the PDF is tagged, but not if they're
> the right tags.
> It tells you if anything is untagged, which quite often is sidebar
> boxes, captions, and other pieces that were left out of the tag tree.
> Tells if any graphics are missing Alt-Text.
> Language and file name options are also flagged if missing.
> And sometimes it can detect when the structure might be off, such as
> headings that appear out of order as heading 3, heading 1, heading 6.
>
> But even with the full report from Acrobat, you're still not getting
> all the information you need. One reason: software can't interpret if
> those are the right tags and if they're in the correct, logical
> reading order. Only humans can assess that!
>
> --Bevi Chagnon
>
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of L Snider
> Sent: Wednesday, February 18, 2015 2:51 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Hi Bevi,
>
> One question on this:
> 1. Run Acrobat's accessibility checker. This looks at only about 20%
> of the document's features, so don't depend on it for a full check.
>
> This is the full report and check, right? If so, what else would you
check?
>
> Cheers
>
> Lisa
>
> On Wed, Feb 18, 2015 at 1:41 PM, Chagnon | PubCom < <EMAIL REMOVED> >
> wrote:
>
> > Lynn, I too have a strong programming background in HTML, as well as
> > SGML, XML, and many other markup languages. So tags plus reading
> > order create the document's structure in my mind! In theory, I don't
> > believe a PDF can have any structure, good or bad, without tags. All
> > PDFs have a page architecture, but that's not the same thing as
structure.
> >
> > Lynn asked: " if so how would I recognise it if I were to examine
> > the document's building blocks "
> >
> > You have to examine it from several viewpoints in Acrobat Pro. I
> > teach my students this method:
> > 1. Run Acrobat's accessibility checker. This looks at only about 20%
> > of the document's features, so don't depend on it for a full check.
> >
> > 2. Run down the tag tree, top-to-bottom. I call this the tag reading
> order.
> > For sighted users, they can arrow down from tag to tag and also see
> > on the page which item is highlighted for each tag. They'll see very
> > quickly that the figures weren't read at the correct place in the
> > tag tree, or that the second half of body text was read first, then
> > the heading 1, then the remaining body text.
> >
> > For screen reader users, this is what your software is using. But
> > it's more difficult to tell if the document is correct. Were you
> > able to hear and figure out what was read? Did it make sense (not
> > the content itself, but the order in which you heard it)? Screen
> > readers also can't tell sometimes if it's tagged correctly. Example:
> > Adobe InDesign has a tragic flaw. When a sidebar (boxed text that's
> > secondary to the main story) is exported to PDF, the conversion
> > isn't correct. All of the text is jumbled together; paragraphs are
> > lost, including any headings, bulleted lists, tables, figures, etc.
> > So a screen reader just hears the text run-on blah blah blah, but
> > never knows if he's reading one paragraph, multiple paragraphs,
> > headings, or any other parts of a document. My screen reader testers
> > often miss these problems; they just can't tell if they've missing
> > something or if it's incorrect.
> >
> > 3. Run down the "real" reading order. This is the Order panel in
Acrobat.
> > Often overlooked by many in accessible documentation, this is the
> > original reading order that's still used by many assistive
> > technologies, including braille printers and keyboards. I've never
> > had any of my screen reader testers review this because their
> > software has a hard time voicing it in a way that makes sense to
> > them. But they can see this reading order another way; View / Zoom /
> > Reflow. This utility rejiggers the visual layout on the screen to
> > mimic the real reading order. Columns are removed, everything is
> > sequential and linear, top to bottom. So if the first item read by a
> > screen reader happens to be the photo caption, not heading 1, then
> > you have a reading
> order problem.
> >
> > 4. After that, the usual review of tags, tables, alt-text, etc.
> > takes place.
> >
> > --Bevi Chagnon
> >
> > -----Original Message-----
> > From: <EMAIL REMOVED>
> > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > Holdsworth
> > Sent: Wednesday, February 18, 2015 1:46 PM
> > To: WebAIM Discussion List
> > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >
> > Hi Bevi,
> >
> > Thanks for taking the time to write such a comprehensive response.
> >
> > From creating HTML pages for about half a lifetime, I'd define tags
> > and structure pretty much the way you do.
> >
> > But I inferred from this thread, and from talking with someone who
> > knows a lot more about PDF than I do, that it's possible to have
> > structure without tags in a PDF document. Is this correct, and if so
> > how would I recognise it if I were to examine the document's
> > building
> blocks?
> >
> > Best, Lynn
> >
> > On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > > Lynn wrote: " in PDF docs, what's the difference between tags and
> > > structure?
> > > "
> > >
> > > This is one of the toughest concepts we teachers have to explain!
> > > I'd love to hear how others describe it. Here's my take:
> > >
> > > Tags are labels. Code labels, specifically, that are read by
> > > Assistive Technologies and are not usually visible to sighted
> > > users unless they have Acrobat Pro. They let AT users know what's
> > > a heading 2, a list of bullets, tables, and other parts of the
> > > documents. Tags also do a lot of work for us, such as assisting us
> > > in creating bookmarks and tables of contents, creating navigation
> > > systems, and holding the Alt-text on graphics (Alt-Text is an
> > > attribute on the figure tag and doesn't stand alone on its own).
> > >
> > > Structure is the sequence of how the document's pieces will be
> > > read, or in other words, the sequence in which the tagged items are
read.
> > > Call it reading order or tag reading order. The structure of some
> > > documents can also have nesting qualities, such as all the pieces
> > > of a chapter, and all the chapters in a book.
> > >
> > > An example: If Heading 1 designates a chapter title, then all the
> > > paragraph, bullets, tables, and heading 2 items within that
> > > chapter will be nested inside the main heading 1 tag. This allows
> > > AT software to figure out, hopefully, what goes with what; that
> > > all the tags nested within Heading 1 is a chapter.
> > >
> > > Structure is created when you have tags (the right tag labels) and
> > > a reading order (a logical reading order). It is possible that a
> > > tagged and structured document might not be fully accessible
> > > because the tags aren't accurate enough or the reading order is out of
whack.
> > >
> > > Example number 1: In older versions of MS Word, figures would be
> > > placed in very odd places of the reading order when it was
> > > exported to a PDF. If paragraph 1 stated "see figure 5", figure 5
> > > itself might end up at the very end of the reading order, not near
> > > paragraph 1 where it was referenced. A sighted person sees figure
> > > 5 next to the paragraph, but a screen reader user doesn't hear it
> > > voiced until the last page, and maybe that's page 360 of a long
> > > government document. So the document is tagged and structured, but
> > > it's a faulty structure because the reading order is incorrect.
> > >
> > > Example number 2: Graphic designers who use desktop publishing
> > > programs like Adobe InDesign and QuarkXpress create very complex
> > > visual layouts.
> > > Visually,
> > > things aren't designed in a traditional top down left right
> > > pattern but instead could be scattered all over the physical page.
> > > Here's an example of a 2-page magazine spread:
> > > http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Des
> > > ig n_ 1_by_B reakTheRecords.jpg (This is just a random sample I
> > > pulled up on the Internet, so it is only a graphic of a 2-page
> > > spread, no live text or
> > > Alt-text.)
> > >
> > > Note that article title (or heading 1) appears on page 2, and the
> > > body text of the story starts on page 1. Backwards! And then there
> > > are 2 quotes at the top of page 1, so obviously the designer wants
> > > us to read those at the beginning of the story, also. And here's a
> > > similar
> > > example:
> > > https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b
> > > 29
> > > 49
> > > 607393
> > > d8b1f24ad96.jpg
> > >
> > > Whew! Getting a tagged, logical reading order from this type of
> > > publication isn't easy!
> > >
> > > Summary:
> > > Structure equals tagged content placed in a logical reading order.
> > >
> > > Well, that's my attempt. Would love to hear how others describe
> > > the concepts.
> > >
> > > --Bevi Chagnon
> > >
> > > -----Original Message-----
> > > From: <EMAIL REMOVED>
> > > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > > Holdsworth
> > > Sent: Wednesday, February 18, 2015 12:11 PM
> > > To: WebAIM Discussion List
> > > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >
> > > Thanks so much everyone for weighing in - I've found this a very
> > > useful thread indeed.
> > >
> > > One more question: in PDF docs, what's the difference between tags
> > > and structure? Ryan mentioned that the doc may include structure
> > > but not be tagged, and I don't understand the difference.
> > >
> > > And thanks Duff for the LinkedIn group suggestions. I'll join at
> > > least the first one.
> > >
> > > Really hoping that Adobe is working on ironing out the
> > > accessibility glitches in the DownLoad Assistant, as I'd
> > > appreciate the chance to learn about and use what seems like a
> > > great bunch of accessibility features in Acrobat.
> > >
> > > Best, Lynn
> > >
> > > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> > >> Bim,
> > >> I was talking about both Acrobat and Reader in my reply, sorry if
> > >> that wasn't clear. It is the same process for both.
> > >> AWK
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim
> > >> Egan
> > >> Sent: Wednesday, February 18, 2015 7:13 AM
> > >> To: 'WebAIM Discussion List'
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Lynn didn't seem to be talking about using Acrobat though. She
> > >> described the experience of many screen reader users in finding a
> > >> table in an untagged PDF when opened in Reader, and she asked why
> > >> this could happen.
> Her
> > >> message said that the Acrobat installation wasn't accessible.
> > >>
> > >> Bim
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> > >> Kirkpatrick
> > >> Sent: 18 February 2015 14:36
> > >> To: WebAIM Discussion List
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Jon is correct. When Acrobat opens an untagged document and
> > >> there is a client that is using the accessibility API data
> > >> running, Acrobat (or
> > >> Reader) will add tags to the document. The result is the same as
> > >> if an author used the "add tags" feature in Acrobat. You get
> > >> Acrobat's best interpretation of what the tags should be. That
> > >> will sometimes result in headings, well-formed tables, lists, and
> other
> structures.
> > >> Authors who use this feature in Acrobat know that you generally
> > >> need to
> > > fix some of the tags.
> > >>
> > >>
> > >>
> > >> The result is that the document is tagged temporarily and
> > >> assistive technologies recognize and use the information.
> > >>
> > >>
> > >>
> > >> The dialogs that you see when opening PDF documents give you some
> > >> information about what is going on. To understand better, here's
> > >> my explanation.
> > >>
> > >>
> > >>
> > >> In acrobat or Reader preferences there is a "Reading" category.
> > >> There is a checkbox that is labeled "Confirm before tagging
> > >> documents". If this is checked, then every time that Reader
> > >> intends to tag an untagged document the "Reading an untagged
> > >> document with assistive technology" dialog pops up and the user
> > >> needs to confirm that this is what they'd like to do. If the
> > >> user selects cancel then the document won't be tagged and the
> > >> reading experience will be essentially
> > > non-existent.
> > >>
> > >>
> > >>
> > >> If you elect to allow the tagging, there are other options as
> > >> mentioned in one of the replies. I recommend using the "infer
> > >> reading order from document" option.
> > >>
> > >>
> > >>
> > >> There are other settings related to large documents and auto-tagging.
> > >> Autotagging takes time, so if you open a very dense 600 page
> > >> manual you may find that Reader takes a long time to do the
> > >> tagging. It can, and we are always looking to improve the
> > >> efficiency of this
> > process.
> > >> The option for the user is to indicate whether the autotagging
> > >> should occur only on visible pages, on all pages in the document,
> > >> or on all pages except if the document is "large". The user gets
> > >> to define what large means - a user might find that their system
> > >> is slow at this so sets the limit at 25 pages, or might set it
> > >> higher if their system handles this process quickly. The down
> > >> side of only tagging a few pages at a time is that if there are
> > >> recognized structures on pages that haven't been tagged yet (e.g.
> > >> a heading on page 51) the user can't use screen reader heading
> > >> navigation to jump to it because the tags
> > > don't exist until the page is in view in the reader.
> > >>
> > >>
> > >>
> > >> Hope this helps,
> > >>
> > >> AWK
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > >> Holdsworth
> > >> Sent: Wednesday, February 18, 2015 4:36 AM
> > >> To: WebAIM Discussion List
> > >> Subject: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> Apologies if PDF accessibility is off topic. If so is there a
> > >> list that covers this?
> > >>
> > >>
> > >>
> > >> But if not ...
> > >>
> > >>
> > >>
> > >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> > >>
> > >>
> > >>
> > >> So I begin to peruse it using JAWS, and come across a table whose
> > >> structure is robust enough for me to move around it using the
> > >> JAWS table
> > > keystrokes.
> > >>
> > >>
> > >>
> > >> Does this mean there *are* tags in the document after all? Or has
> > >> Adobe Reader used heuristics to add tags to improve the doc's
> > >> accessibility, since my settings flag up that I'm using a
> screenreader?
> > >>
> > >>
> > >>
> > >> I tried to download a trial version of Acrobat Pro so as to
> > >> examine the document structure, but the download assistant seems
inaccessible.
> > >>
> > >>
> > >>
> > >> Thanks as always, Lynn
> > >>
> > >> > > >>
> > >> > > >> > > >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> > >> > > >> > > >> > > >>
> > >> > > >> > > >> > > >> > > >> > > >> > > >>
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > list messages to <EMAIL REMOVED>
>
> > > list messages to <EMAIL REMOVED>
>
messages to <EMAIL REMOVED>