WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Chagnon | PubCom
Date: Feb 18, 2015 12:41PM


Lynn, I too have a strong programming background in HTML, as well as SGML,
XML, and many other markup languages. So tags plus reading order create the
document's structure in my mind! In theory, I don't believe a PDF can have
any structure, good or bad, without tags. All PDFs have a page architecture,
but that's not the same thing as structure.

Lynn asked: " if so how would I recognise it if I were to examine the
document's building blocks "

You have to examine it from several viewpoints in Acrobat Pro. I teach my
students this method:
1. Run Acrobat's accessibility checker. This looks at only about 20% of the
document's features, so don't depend on it for a full check.

2. Run down the tag tree, top-to-bottom. I call this the tag reading order.
For sighted users, they can arrow down from tag to tag and also see on the
page which item is highlighted for each tag. They'll see very quickly that
the figures weren't read at the correct place in the tag tree, or that the
second half of body text was read first, then the heading 1, then the
remaining body text.

For screen reader users, this is what your software is using. But it's more
difficult to tell if the document is correct. Were you able to hear and
figure out what was read? Did it make sense (not the content itself, but the
order in which you heard it)? Screen readers also can't tell sometimes if
it's tagged correctly. Example: Adobe InDesign has a tragic flaw. When a
sidebar (boxed text that's secondary to the main story) is exported to PDF,
the conversion isn't correct. All of the text is jumbled together;
paragraphs are lost, including any headings, bulleted lists, tables,
figures, etc. So a screen reader just hears the text run-on blah blah blah,
but never knows if he's reading one paragraph, multiple paragraphs,
headings, or any other parts of a document. My screen reader testers often
miss these problems; they just can't tell if they've missing something or if
it's incorrect.

3. Run down the "real" reading order. This is the Order panel in Acrobat.
Often overlooked by many in accessible documentation, this is the original
reading order that's still used by many assistive technologies, including
braille printers and keyboards. I've never had any of my screen reader
testers review this because their software has a hard time voicing it in a
way that makes sense to them. But they can see this reading order another
way; View / Zoom / Reflow. This utility rejiggers the visual layout on the
screen to mimic the real reading order. Columns are removed, everything is
sequential and linear, top to bottom. So if the first item read by a screen
reader happens to be the photo caption, not heading 1, then you have a
reading order problem.

4. After that, the usual review of tags, tables, alt-text, etc. takes place.

--Bevi Chagnon