WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Ryan E. Benson
Date: Feb 18, 2015 3:50PM


InDesign only recognizes a handful of standard PDF tags. I can't find the
list right now, but I am pretty sure it is in the help. InDesign knows
<Table>, <Tr> and <Td>, for example, but not <TH> or something like that.
PDF tags are case sensitive, so if you create an h1 Tag for your inDesign
document, it gets mapped to the <P> tag in the PDF. However, creating the
H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.

--
Ryan E. Benson

On Wed, Feb 18, 2015 at 3:12 PM, Chagnon | PubCom < <EMAIL REMOVED> >
wrote:

> Agree with Andrew. Bad tags do not create a structure.
>
> Adobe InDesign is famous for creating ridiculous tags in PDFs exported from
> the layout files. Tags like <blue_subhead_with_extra_space_above> or
> <judys_inserted_copy> are some I recently saw. Acrobat erroneously creates
> the tags from the names of the designer's paragraph formatting styles, not
> from what has been programmed to be <h1> or <h2>. So Acrobat's Role Map
> utility has to reinterpret those wild and crazy tags into normal <h1>, <h2>
> etc tags.
>
> A lot depends on 4 things:
>
> 1) The software version in which the software was created. MS Word 2013
> tags
> things more correctly than Word 2010 or 2007. Same with Adobe InDesign.
> Always use the most recent version to create the source documents.
> Standards
> change, as well as the tools we use to create to those standards, so the
> latest software version will always give the best results and, hopefully,
> builds documents to the latest standards. As an example, look how the
> tagging of lists has changed over the past 10 years.
>
> 2) The software version of Acrobat that was used to create the PDF. In MS
> Word, for example, when we select File / Save as PDF, we're invoking an
> Acrobat module (or plug-in) in Word that interprets the Word document to
> create the PDF. Which version of Acrobat did the conversion? Acrobat 11
> does
> a better job than 10 which does a better job than 9. FYI, you can see the
> versions of Acrobat and the source program in the PDF's File / Properties
> utility. Also, some people use non-Adobe PDF makers, which from my
> experience don't make accessible PDFs at all.
>
> 3) The conversion settings (or preferences) when the PDF was exported from
> the source document. Miss a few checkboxes in the settings and you won't
> get
> an accessible PDF.
>
> 4) The skill of the person who created the source document and the PDF. If
> they don't know how to use Word's footnote utility and instead insert them
> by hand, then the PDF's footnotes won't be fully accessible. If they're a
> novice user of Adobe InDesign, forget it! The file will be a inaccessible
> nightmare!
>
> --Bevi Chagnon
>
>
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> Kirkpatrick
> Sent: Wednesday, February 18, 2015 2:36 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Tagged PDF just means that it has tags at all. There is no guarantee that
> they are correct. This is where PDF/UA helps in that it answers the
> important question "tagged how?"
> AWK
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Brian Richwine
> Sent: Wednesday, February 18, 2015 11:20 AM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Hi,
>
> I've heard accessibility professionals state that a "tagged pdf" is one
> that
> uses a standard set of tags. So, some PDFs can have structure, but be using
> a non-standard set of tags and thus assistive technologies will not know
> how
> to interpret the tags that are in the document. So I came away thinking
> that
> a PDF that has tags is a structured PDF, and even better (in terms of
> accessibility) a PDF that uses standardized tags has structure and is
> "tagged".
>
> -Brian
>
> On Wed, Feb 18, 2015 at 1:45 PM, Lynn Holdsworth <
> <EMAIL REMOVED> >
> wrote:
>
> > Hi Bevi,
> >
> > Thanks for taking the time to write such a comprehensive response.
> >
> > From creating HTML pages for about half a lifetime, I'd define tags
> > and structure pretty much the way you do.
> >
> > But I inferred from this thread, and from talking with someone who
> > knows a lot more about PDF than I do, that it's possible to have
> > structure without tags in a PDF document. Is this correct, and if so
> > how would I recognise it if I were to examine the document's building
> > blocks?
> >
> > Best, Lynn
> >
> > On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > > Lynn wrote: " in PDF docs, what's the difference between tags and
> > > structure?
> > > "
> > >
> > > This is one of the toughest concepts we teachers have to explain!
> > > I'd
> > love
> > > to hear how others describe it. Here's my take:
> > >
> > > Tags are labels. Code labels, specifically, that are read by
> > > Assistive Technologies and are not usually visible to sighted users
> > > unless they
> > have
> > > Acrobat Pro. They let AT users know what's a heading 2, a list of
> > bullets,
> > > tables, and other parts of the documents. Tags also do a lot of work
> > > for us, such as assisting us in creating bookmarks and tables of
> > > contents,
> > creating
> > > navigation systems, and holding the Alt-text on graphics (Alt-Text
> > > is an attribute on the figure tag and doesn't stand alone on its own).
> > >
> > > Structure is the sequence of how the document's pieces will be read,
> > > or
> > in
> > > other words, the sequence in which the tagged items are read. Call
> > > it reading order or tag reading order. The structure of some
> > > documents can also have nesting qualities, such as all the pieces of
> > > a chapter, and all the chapters in a book.
> > >
> > > An example: If Heading 1 designates a chapter title, then all the
> > > paragraph, bullets, tables, and heading 2 items within that chapter
> > > will be nested inside the main heading 1 tag. This allows AT
> > > software to figure out, hopefully, what goes with what; that all the
> > > tags nested within Heading 1 is a chapter.
> > >
> > > Structure is created when you have tags (the right tag labels) and a
> > > reading order (a logical reading order). It is possible that a
> > > tagged and structured document might not be fully accessible because
> > > the tags aren't accurate enough or the reading order is out of
> > > whack.
> > >
> > > Example number 1: In older versions of MS Word, figures would be
> > > placed
> > in
> > > very odd places of the reading order when it was exported to a PDF.
> > > If paragraph 1 stated "see figure 5", figure 5 itself might end up
> > > at the
> > very
> > > end of the reading order, not near paragraph 1 where it was
> > > referenced. A sighted person sees figure 5 next to the paragraph,
> > > but a screen reader user doesn't hear it voiced until the last page,
> > > and maybe that's page 360 of
> > a
> > > long government document. So the document is tagged and structured,
> > > but it's a faulty structure because the reading order is incorrect.
> > >
> > > Example number 2: Graphic designers who use desktop publishing
> > > programs like Adobe InDesign and QuarkXpress create very complex
> > > visual layouts.
> > > Visually,
> > > things aren't designed in a traditional top down left right pattern
> > > but instead could be scattered all over the physical page. Here's an
> > > example
> > of
> > > a 2-page magazine spread:
> > >
> > http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_
> > 1_by_B
> > > reakTheRecords.jpg (This is just a random sample I pulled up on the
> > > Internet, so it is only a graphic of a 2-page spread, no live text
> > > or
> > > Alt-text.)
> > >
> > > Note that article title (or heading 1) appears on page 2, and the
> > > body
> > text
> > > of the story starts on page 1. Backwards! And then there are 2
> > > quotes at the top of page 1, so obviously the designer wants us to
> > > read those at the beginning of the story, also. And here's a similar
> > > example:
> > >
> > https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949
> > 607393
> > > d8b1f24ad96.jpg
> > >
> > > Whew! Getting a tagged, logical reading order from this type of
> > > publication isn't easy!
> > >
> > > Summary:
> > > Structure equals tagged content placed in a logical reading order.
> > >
> > > Well, that's my attempt. Would love to hear how others describe the
> > > concepts.
> > >
> > > --Bevi Chagnon
> > >
> > > -----Original Message-----
> > > From: <EMAIL REMOVED>
> > > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > Holdsworth
> > > Sent: Wednesday, February 18, 2015 12:11 PM
> > > To: WebAIM Discussion List
> > > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >
> > > Thanks so much everyone for weighing in - I've found this a very
> > > useful thread indeed.
> > >
> > > One more question: in PDF docs, what's the difference between tags
> > > and structure? Ryan mentioned that the doc may include structure but
> > > not be tagged, and I don't understand the difference.
> > >
> > > And thanks Duff for the LinkedIn group suggestions. I'll join at
> > > least
> > the
> > > first one.
> > >
> > > Really hoping that Adobe is working on ironing out the accessibility
> > > glitches in the DownLoad Assistant, as I'd appreciate the chance to
> > > learn about and use what seems like a great bunch of accessibility
> > > features in Acrobat.
> > >
> > > Best, Lynn
> > >
> > > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> > >> Bim,
> > >> I was talking about both Acrobat and Reader in my reply, sorry if
> > >> that wasn't clear. It is the same process for both.
> > >> AWK
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim Egan
> > >> Sent: Wednesday, February 18, 2015 7:13 AM
> > >> To: 'WebAIM Discussion List'
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Lynn didn't seem to be talking about using Acrobat though. She
> > >> described the experience of many screen reader users in finding a
> > >> table in an untagged
> > >> PDF when opened in Reader, and she asked why this could happen.
> Her
> > >> message said that the Acrobat installation wasn't accessible.
> > >>
> > >> Bim
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> > >> Kirkpatrick
> > >> Sent: 18 February 2015 14:36
> > >> To: WebAIM Discussion List
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Jon is correct. When Acrobat opens an untagged document and there
> > >> is a client that is using the accessibility API data running,
> > >> Acrobat (or
> > >> Reader) will add tags to the document. The result is the same as
> > >> if an author used the "add tags" feature in Acrobat. You get
> > >> Acrobat's best interpretation of what the tags should be. That
> > >> will sometimes result in headings, well-formed tables, lists, and
> other
> structures.
> > >> Authors who use this feature in Acrobat know that you generally
> > >> need to
> > > fix some of the tags.
> > >>
> > >>
> > >>
> > >> The result is that the document is tagged temporarily and assistive
> > >> technologies recognize and use the information.
> > >>
> > >>
> > >>
> > >> The dialogs that you see when opening PDF documents give you some
> > >> information about what is going on. To understand better, here's
> > >> my explanation.
> > >>
> > >>
> > >>
> > >> In acrobat or Reader preferences there is a "Reading" category.
> > >> There is a checkbox that is labeled "Confirm before tagging
> > >> documents". If this is checked, then every time that Reader
> > >> intends to tag an untagged document the "Reading an untagged
> > >> document with assistive technology" dialog pops up and the user
> > >> needs to confirm that this is what they'd like to do. If the user
> > >> selects cancel then the document won't be tagged and the reading
> > >> experience will be essentially
> > > non-existent.
> > >>
> > >>
> > >>
> > >> If you elect to allow the tagging, there are other options as
> > >> mentioned in one of the replies. I recommend using the "infer
> > >> reading order from document" option.
> > >>
> > >>
> > >>
> > >> There are other settings related to large documents and auto-tagging.
> > >> Autotagging takes time, so if you open a very dense 600 page manual
> > >> you may find that Reader takes a long time to do the tagging. It
> > >> can, and we are always looking to improve the efficiency of this
> process.
> > >> The option for the user is to indicate whether the autotagging
> > >> should occur only on visible pages, on all pages in the document,
> > >> or on all pages except if the document is "large". The user gets
> > >> to define what large means - a user might find that their system is
> > >> slow at this so sets the limit at 25 pages, or might set it higher
> > >> if their system handles this process quickly. The down side of only
> > >> tagging a few pages at a time is that if there are recognized
> > >> structures on pages that haven't been tagged yet (e.g. a heading on
> > >> page 51) the user can't use screen reader heading navigation to
> > >> jump to it because the
> > tags
> > > don't exist until the page is in view in the reader.
> > >>
> > >>
> > >>
> > >> Hope this helps,
> > >>
> > >> AWK
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > >> Holdsworth
> > >> Sent: Wednesday, February 18, 2015 4:36 AM
> > >> To: WebAIM Discussion List
> > >> Subject: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> Apologies if PDF accessibility is off topic. If so is there a list
> > >> that covers this?
> > >>
> > >>
> > >>
> > >> But if not ...
> > >>
> > >>
> > >>
> > >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> > >>
> > >>
> > >>
> > >> So I begin to peruse it using JAWS, and come across a table whose
> > >> structure is robust enough for me to move around it using the JAWS
> > >> table
> > > keystrokes.
> > >>
> > >>
> > >>
> > >> Does this mean there *are* tags in the document after all? Or has
> > >> Adobe Reader used heuristics to add tags to improve the doc's
> > >> accessibility, since my settings flag up that I'm using a
> screenreader?
> > >>
> > >>
> > >>
> > >> I tried to download a trial version of Acrobat Pro so as to examine
> > >> the document structure, but the download assistant seems inaccessible.
> > >>
> > >>
> > >>
> > >> Thanks as always, Lynn
> > >>
> > >> > > >>
> > >> > > >> list messages to
> > >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> > >> > > >> > > >> list messages to <EMAIL REMOVED>
> > >>
> > >> > > >> > > >> list messages to <EMAIL REMOVED>
> > >> > > >> > > >> list messages to <EMAIL REMOVED>
> > >>
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > messages to <EMAIL REMOVED>
> > > messages to <EMAIL REMOVED>
>
> > > >