WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: L Snider
Date: Feb 18, 2015 1:18PM


Ah okay, I see where you were going now-thanks. Yes, it is like WCAG...You
can do it and it can still be inaccessible :)

Even after all these years, most of it is still manual. Funny how things
have changed and some things are still the same.

I am loving XI Pro, because you only do the full report by default. None of
the messiness of previous versions.

Cheers

Lisa

On Wed, Feb 18, 2015 at 2:08 PM, Chagnon | PubCom < <EMAIL REMOVED> >
wrote:

> Yes, always run the full report in Acrobat checker and don't waste your
> time
> with the other options.
>
> The Acrobat checker tells you if the PDF is tagged, but not if they're the
> right tags.
> It tells you if anything is untagged, which quite often is sidebar boxes,
> captions, and other pieces that were left out of the tag tree.
> Tells if any graphics are missing Alt-Text.
> Language and file name options are also flagged if missing.
> And sometimes it can detect when the structure might be off, such as
> headings that appear out of order as heading 3, heading 1, heading 6.
>
> But even with the full report from Acrobat, you're still not getting all
> the
> information you need. One reason: software can't interpret if those are the
> right tags and if they're in the correct, logical reading order. Only
> humans
> can assess that!
>
> --Bevi Chagnon
>
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of L Snider
> Sent: Wednesday, February 18, 2015 2:51 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Hi Bevi,
>
> One question on this:
> 1. Run Acrobat's accessibility checker. This looks at only about 20% of the
> document's features, so don't depend on it for a full check.
>
> This is the full report and check, right? If so, what else would you check?
>
> Cheers
>
> Lisa
>
> On Wed, Feb 18, 2015 at 1:41 PM, Chagnon | PubCom < <EMAIL REMOVED> >
> wrote:
>
> > Lynn, I too have a strong programming background in HTML, as well as
> > SGML, XML, and many other markup languages. So tags plus reading order
> > create the document's structure in my mind! In theory, I don't believe
> > a PDF can have any structure, good or bad, without tags. All PDFs have
> > a page architecture, but that's not the same thing as structure.
> >
> > Lynn asked: " if so how would I recognise it if I were to examine the
> > document's building blocks "
> >
> > You have to examine it from several viewpoints in Acrobat Pro. I teach
> > my students this method:
> > 1. Run Acrobat's accessibility checker. This looks at only about 20%
> > of the document's features, so don't depend on it for a full check.
> >
> > 2. Run down the tag tree, top-to-bottom. I call this the tag reading
> order.
> > For sighted users, they can arrow down from tag to tag and also see on
> > the page which item is highlighted for each tag. They'll see very
> > quickly that the figures weren't read at the correct place in the tag
> > tree, or that the second half of body text was read first, then the
> > heading 1, then the remaining body text.
> >
> > For screen reader users, this is what your software is using. But it's
> > more difficult to tell if the document is correct. Were you able to
> > hear and figure out what was read? Did it make sense (not the content
> > itself, but the order in which you heard it)? Screen readers also
> > can't tell sometimes if it's tagged correctly. Example: Adobe InDesign
> > has a tragic flaw. When a sidebar (boxed text that's secondary to the
> > main story) is exported to PDF, the conversion isn't correct. All of
> > the text is jumbled together; paragraphs are lost, including any
> > headings, bulleted lists, tables, figures, etc. So a screen reader
> > just hears the text run-on blah blah blah, but never knows if he's
> > reading one paragraph, multiple paragraphs, headings, or any other
> > parts of a document. My screen reader testers often miss these
> > problems; they just can't tell if they've missing something or if it's
> > incorrect.
> >
> > 3. Run down the "real" reading order. This is the Order panel in Acrobat.
> > Often overlooked by many in accessible documentation, this is the
> > original reading order that's still used by many assistive
> > technologies, including braille printers and keyboards. I've never
> > had any of my screen reader testers review this because their software
> > has a hard time voicing it in a way that makes sense to them. But they
> > can see this reading order another way; View / Zoom / Reflow. This
> > utility rejiggers the visual layout on the screen to mimic the real
> > reading order. Columns are removed, everything is sequential and
> > linear, top to bottom. So if the first item read by a screen reader
> > happens to be the photo caption, not heading 1, then you have a reading
> order problem.
> >
> > 4. After that, the usual review of tags, tables, alt-text, etc. takes
> > place.
> >
> > --Bevi Chagnon
> >
> > -----Original Message-----
> > From: <EMAIL REMOVED>
> > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > Holdsworth
> > Sent: Wednesday, February 18, 2015 1:46 PM
> > To: WebAIM Discussion List
> > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> >
> > Hi Bevi,
> >
> > Thanks for taking the time to write such a comprehensive response.
> >
> > From creating HTML pages for about half a lifetime, I'd define tags
> > and structure pretty much the way you do.
> >
> > But I inferred from this thread, and from talking with someone who
> > knows a lot more about PDF than I do, that it's possible to have
> > structure without tags in a PDF document. Is this correct, and if so
> > how would I recognise it if I were to examine the document's building
> blocks?
> >
> > Best, Lynn
> >
> > On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > > Lynn wrote: " in PDF docs, what's the difference between tags and
> > > structure?
> > > "
> > >
> > > This is one of the toughest concepts we teachers have to explain!
> > > I'd love to hear how others describe it. Here's my take:
> > >
> > > Tags are labels. Code labels, specifically, that are read by
> > > Assistive Technologies and are not usually visible to sighted users
> > > unless they have Acrobat Pro. They let AT users know what's a
> > > heading 2, a list of bullets, tables, and other parts of the
> > > documents. Tags also do a lot of work for us, such as assisting us
> > > in creating bookmarks and tables of contents, creating navigation
> > > systems, and holding the Alt-text on graphics (Alt-Text is an
> > > attribute on the figure tag and doesn't stand alone on its own).
> > >
> > > Structure is the sequence of how the document's pieces will be read,
> > > or in other words, the sequence in which the tagged items are read.
> > > Call it reading order or tag reading order. The structure of some
> > > documents can also have nesting qualities, such as all the pieces of
> > > a chapter, and all the chapters in a book.
> > >
> > > An example: If Heading 1 designates a chapter title, then all the
> > > paragraph, bullets, tables, and heading 2 items within that chapter
> > > will be nested inside the main heading 1 tag. This allows AT
> > > software to figure out, hopefully, what goes with what; that all the
> > > tags nested within Heading 1 is a chapter.
> > >
> > > Structure is created when you have tags (the right tag labels) and a
> > > reading order (a logical reading order). It is possible that a
> > > tagged and structured document might not be fully accessible because
> > > the tags aren't accurate enough or the reading order is out of whack.
> > >
> > > Example number 1: In older versions of MS Word, figures would be
> > > placed in very odd places of the reading order when it was exported
> > > to a PDF. If paragraph 1 stated "see figure 5", figure 5 itself
> > > might end up at the very end of the reading order, not near
> > > paragraph 1 where it was referenced. A sighted person sees figure 5
> > > next to the paragraph, but a screen reader user doesn't hear it
> > > voiced until the last page, and maybe that's page 360 of a long
> > > government document. So the document is tagged and structured, but
> > > it's a faulty structure because the reading order is incorrect.
> > >
> > > Example number 2: Graphic designers who use desktop publishing
> > > programs like Adobe InDesign and QuarkXpress create very complex
> > > visual layouts.
> > > Visually,
> > > things aren't designed in a traditional top down left right pattern
> > > but instead could be scattered all over the physical page. Here's an
> > > example of a 2-page magazine spread:
> > > http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Desig
> > > n_ 1_by_B reakTheRecords.jpg (This is just a random sample I pulled
> > > up on the Internet, so it is only a graphic of a 2-page spread, no
> > > live text or
> > > Alt-text.)
> > >
> > > Note that article title (or heading 1) appears on page 2, and the
> > > body text of the story starts on page 1. Backwards! And then there
> > > are 2 quotes at the top of page 1, so obviously the designer wants
> > > us to read those at the beginning of the story, also. And here's a
> > > similar
> > > example:
> > > https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b29
> > > 49
> > > 607393
> > > d8b1f24ad96.jpg
> > >
> > > Whew! Getting a tagged, logical reading order from this type of
> > > publication isn't easy!
> > >
> > > Summary:
> > > Structure equals tagged content placed in a logical reading order.
> > >
> > > Well, that's my attempt. Would love to hear how others describe the
> > > concepts.
> > >
> > > --Bevi Chagnon
> > >
> > > -----Original Message-----
> > > From: <EMAIL REMOVED>
> > > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > > Holdsworth
> > > Sent: Wednesday, February 18, 2015 12:11 PM
> > > To: WebAIM Discussion List
> > > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >
> > > Thanks so much everyone for weighing in - I've found this a very
> > > useful thread indeed.
> > >
> > > One more question: in PDF docs, what's the difference between tags
> > > and structure? Ryan mentioned that the doc may include structure but
> > > not be tagged, and I don't understand the difference.
> > >
> > > And thanks Duff for the LinkedIn group suggestions. I'll join at
> > > least the first one.
> > >
> > > Really hoping that Adobe is working on ironing out the accessibility
> > > glitches in the DownLoad Assistant, as I'd appreciate the chance to
> > > learn about and use what seems like a great bunch of accessibility
> > > features in Acrobat.
> > >
> > > Best, Lynn
> > >
> > > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> > >> Bim,
> > >> I was talking about both Acrobat and Reader in my reply, sorry if
> > >> that wasn't clear. It is the same process for both.
> > >> AWK
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim Egan
> > >> Sent: Wednesday, February 18, 2015 7:13 AM
> > >> To: 'WebAIM Discussion List'
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Lynn didn't seem to be talking about using Acrobat though. She
> > >> described the experience of many screen reader users in finding a
> > >> table in an untagged
> > >> PDF when opened in Reader, and she asked why this could happen.
> Her
> > >> message said that the Acrobat installation wasn't accessible.
> > >>
> > >> Bim
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> > >> Kirkpatrick
> > >> Sent: 18 February 2015 14:36
> > >> To: WebAIM Discussion List
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Jon is correct. When Acrobat opens an untagged document and there
> > >> is a client that is using the accessibility API data running,
> > >> Acrobat (or
> > >> Reader) will add tags to the document. The result is the same as
> > >> if an author used the "add tags" feature in Acrobat. You get
> > >> Acrobat's best interpretation of what the tags should be. That
> > >> will sometimes result in headings, well-formed tables, lists, and
> other
> structures.
> > >> Authors who use this feature in Acrobat know that you generally
> > >> need to
> > > fix some of the tags.
> > >>
> > >>
> > >>
> > >> The result is that the document is tagged temporarily and assistive
> > >> technologies recognize and use the information.
> > >>
> > >>
> > >>
> > >> The dialogs that you see when opening PDF documents give you some
> > >> information about what is going on. To understand better, here's
> > >> my explanation.
> > >>
> > >>
> > >>
> > >> In acrobat or Reader preferences there is a "Reading" category.
> > >> There is a checkbox that is labeled "Confirm before tagging
> > >> documents". If this is checked, then every time that Reader
> > >> intends to tag an untagged document the "Reading an untagged
> > >> document with assistive technology" dialog pops up and the user
> > >> needs to confirm that this is what they'd like to do. If the user
> > >> selects cancel then the document won't be tagged and the reading
> > >> experience will be essentially
> > > non-existent.
> > >>
> > >>
> > >>
> > >> If you elect to allow the tagging, there are other options as
> > >> mentioned in one of the replies. I recommend using the "infer
> > >> reading order from document" option.
> > >>
> > >>
> > >>
> > >> There are other settings related to large documents and auto-tagging.
> > >> Autotagging takes time, so if you open a very dense 600 page manual
> > >> you may find that Reader takes a long time to do the tagging. It
> > >> can, and we are always looking to improve the efficiency of this
> > process.
> > >> The option for the user is to indicate whether the autotagging
> > >> should occur only on visible pages, on all pages in the document,
> > >> or on all pages except if the document is "large". The user gets
> > >> to define what large means - a user might find that their system is
> > >> slow at this so sets the limit at 25 pages, or might set it higher
> > >> if their system handles this process quickly. The down side of only
> > >> tagging a few pages at a time is that if there are recognized
> > >> structures on pages that haven't been tagged yet (e.g. a heading on
> > >> page 51) the user can't use screen reader heading navigation to
> > >> jump to it because the tags
> > > don't exist until the page is in view in the reader.
> > >>
> > >>
> > >>
> > >> Hope this helps,
> > >>
> > >> AWK
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > >> Holdsworth
> > >> Sent: Wednesday, February 18, 2015 4:36 AM
> > >> To: WebAIM Discussion List
> > >> Subject: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> Apologies if PDF accessibility is off topic. If so is there a list
> > >> that covers this?
> > >>
> > >>
> > >>
> > >> But if not ...
> > >>
> > >>
> > >>
> > >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> > >>
> > >>
> > >>
> > >> So I begin to peruse it using JAWS, and come across a table whose
> > >> structure is robust enough for me to move around it using the JAWS
> > >> table
> > > keystrokes.
> > >>
> > >>
> > >>
> > >> Does this mean there *are* tags in the document after all? Or has
> > >> Adobe Reader used heuristics to add tags to improve the doc's
> > >> accessibility, since my settings flag up that I'm using a
> screenreader?
> > >>
> > >>
> > >>
> > >> I tried to download a trial version of Acrobat Pro so as to examine
> > >> the document structure, but the download assistant seems inaccessible.
> > >>
> > >>
> > >>
> > >> Thanks as always, Lynn
> > >>
> > >> > > >>
> > >> > > >> list messages to
> > >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> > >> > > >> > > >> list messages to <EMAIL REMOVED>
> > >>
> > >> > > >> > > >> list messages to <EMAIL REMOVED>
> > >> > > >> > > >> list messages to <EMAIL REMOVED>
> > >>
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > messages to <EMAIL REMOVED>
>
> > > >