WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Andrew Kirkpatrick
Date: Feb 18, 2015 4:13PM


Ryan,
I'm not sure what version of InDesign you are using, but InDesign does support TH tags if you use the InDesign table tool and indicate heading rows.

Related to creating the tags with upper or lower case, if you use the correct and recognized tag name from the PDF spec, then yes, the role map isn't needed. But you can also use the feature to map styles to tags and InDesign takes care of the mapping. If you ever have multiple styles that both need to map to H2 then you'll benefit from this feature.

AWK

-----Original Message-----
From: <EMAIL REMOVED> [mailto: <EMAIL REMOVED> ] On Behalf Of Ryan E. Benson
Sent: Wednesday, February 18, 2015 2:51 PM
To: WebAIM Discussion List
Subject: Re: [WebAIM] Untagged PDF doc with table structure

InDesign only recognizes a handful of standard PDF tags. I can't find the list right now, but I am pretty sure it is in the help. InDesign knows <Table>, <Tr> and <Td>, for example, but not <TH> or something like that.
PDF tags are case sensitive, so if you create an h1 Tag for your inDesign document, it gets mapped to the <P> tag in the PDF. However, creating the
H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.

--
Ryan E. Benson

On Wed, Feb 18, 2015 at 3:12 PM, Chagnon | PubCom < <EMAIL REMOVED> >
wrote:

> Agree with Andrew. Bad tags do not create a structure.
>
> Adobe InDesign is famous for creating ridiculous tags in PDFs exported
> from the layout files. Tags like <blue_subhead_with_extra_space_above>
> or <judys_inserted_copy> are some I recently saw. Acrobat erroneously
> creates the tags from the names of the designer's paragraph formatting
> styles, not from what has been programmed to be <h1> or <h2>. So
> Acrobat's Role Map utility has to reinterpret those wild and crazy
> tags into normal <h1>, <h2> etc tags.
>
> A lot depends on 4 things:
>
> 1) The software version in which the software was created. MS Word
> 2013 tags things more correctly than Word 2010 or 2007. Same with
> Adobe InDesign.
> Always use the most recent version to create the source documents.
> Standards
> change, as well as the tools we use to create to those standards, so
> the latest software version will always give the best results and,
> hopefully, builds documents to the latest standards. As an example,
> look how the tagging of lists has changed over the past 10 years.
>
> 2) The software version of Acrobat that was used to create the PDF. In
> MS Word, for example, when we select File / Save as PDF, we're
> invoking an Acrobat module (or plug-in) in Word that interprets the
> Word document to create the PDF. Which version of Acrobat did the
> conversion? Acrobat 11 does a better job than 10 which does a better
> job than 9. FYI, you can see the versions of Acrobat and the source
> program in the PDF's File / Properties utility. Also, some people use
> non-Adobe PDF makers, which from my experience don't make accessible
> PDFs at all.
>
> 3) The conversion settings (or preferences) when the PDF was exported
> from the source document. Miss a few checkboxes in the settings and
> you won't get an accessible PDF.
>
> 4) The skill of the person who created the source document and the
> PDF. If they don't know how to use Word's footnote utility and instead
> insert them by hand, then the PDF's footnotes won't be fully
> accessible. If they're a novice user of Adobe InDesign, forget it! The
> file will be a inaccessible nightmare!
>
> --Bevi Chagnon
>
>
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> Kirkpatrick
> Sent: Wednesday, February 18, 2015 2:36 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Tagged PDF just means that it has tags at all. There is no guarantee
> that they are correct. This is where PDF/UA helps in that it answers
> the important question "tagged how?"
> AWK
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Brian
> Richwine
> Sent: Wednesday, February 18, 2015 11:20 AM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Hi,
>
> I've heard accessibility professionals state that a "tagged pdf" is
> one that uses a standard set of tags. So, some PDFs can have
> structure, but be using a non-standard set of tags and thus assistive
> technologies will not know how to interpret the tags that are in the
> document. So I came away thinking that a PDF that has tags is a
> structured PDF, and even better (in terms of
> accessibility) a PDF that uses standardized tags has structure and is
> "tagged".
>
> -Brian
>
> On Wed, Feb 18, 2015 at 1:45 PM, Lynn Holdsworth <
> <EMAIL REMOVED> >
> wrote:
>
> > Hi Bevi,
> >
> > Thanks for taking the time to write such a comprehensive response.
> >
> > From creating HTML pages for about half a lifetime, I'd define tags
> > and structure pretty much the way you do.
> >
> > But I inferred from this thread, and from talking with someone who
> > knows a lot more about PDF than I do, that it's possible to have
> > structure without tags in a PDF document. Is this correct, and if so
> > how would I recognise it if I were to examine the document's
> > building blocks?
> >
> > Best, Lynn
> >
> > On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > > Lynn wrote: " in PDF docs, what's the difference between tags and
> > > structure?
> > > "
> > >
> > > This is one of the toughest concepts we teachers have to explain!
> > > I'd
> > love
> > > to hear how others describe it. Here's my take:
> > >
> > > Tags are labels. Code labels, specifically, that are read by
> > > Assistive Technologies and are not usually visible to sighted
> > > users unless they
> > have
> > > Acrobat Pro. They let AT users know what's a heading 2, a list of
> > bullets,
> > > tables, and other parts of the documents. Tags also do a lot of
> > > work for us, such as assisting us in creating bookmarks and tables
> > > of contents,
> > creating
> > > navigation systems, and holding the Alt-text on graphics (Alt-Text
> > > is an attribute on the figure tag and doesn't stand alone on its own).
> > >
> > > Structure is the sequence of how the document's pieces will be
> > > read, or
> > in
> > > other words, the sequence in which the tagged items are read. Call
> > > it reading order or tag reading order. The structure of some
> > > documents can also have nesting qualities, such as all the pieces
> > > of a chapter, and all the chapters in a book.
> > >
> > > An example: If Heading 1 designates a chapter title, then all the
> > > paragraph, bullets, tables, and heading 2 items within that
> > > chapter will be nested inside the main heading 1 tag. This allows
> > > AT software to figure out, hopefully, what goes with what; that
> > > all the tags nested within Heading 1 is a chapter.
> > >
> > > Structure is created when you have tags (the right tag labels) and
> > > a reading order (a logical reading order). It is possible that a
> > > tagged and structured document might not be fully accessible
> > > because the tags aren't accurate enough or the reading order is
> > > out of whack.
> > >
> > > Example number 1: In older versions of MS Word, figures would be
> > > placed
> > in
> > > very odd places of the reading order when it was exported to a PDF.
> > > If paragraph 1 stated "see figure 5", figure 5 itself might end up
> > > at the
> > very
> > > end of the reading order, not near paragraph 1 where it was
> > > referenced. A sighted person sees figure 5 next to the paragraph,
> > > but a screen reader user doesn't hear it voiced until the last
> > > page, and maybe that's page 360 of
> > a
> > > long government document. So the document is tagged and
> > > structured, but it's a faulty structure because the reading order is incorrect.
> > >
> > > Example number 2: Graphic designers who use desktop publishing
> > > programs like Adobe InDesign and QuarkXpress create very complex
> > > visual layouts.
> > > Visually,
> > > things aren't designed in a traditional top down left right
> > > pattern but instead could be scattered all over the physical page.
> > > Here's an example
> > of
> > > a 2-page magazine spread:
> > >
> > http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Desig
> > n_
> > 1_by_B
> > > reakTheRecords.jpg (This is just a random sample I pulled up on
> > > the Internet, so it is only a graphic of a 2-page spread, no live
> > > text or
> > > Alt-text.)
> > >
> > > Note that article title (or heading 1) appears on page 2, and the
> > > body
> > text
> > > of the story starts on page 1. Backwards! And then there are 2
> > > quotes at the top of page 1, so obviously the designer wants us to
> > > read those at the beginning of the story, also. And here's a
> > > similar
> > > example:
> > >
> > https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b29
> > 49
> > 607393
> > > d8b1f24ad96.jpg
> > >
> > > Whew! Getting a tagged, logical reading order from this type of
> > > publication isn't easy!
> > >
> > > Summary:
> > > Structure equals tagged content placed in a logical reading order.
> > >
> > > Well, that's my attempt. Would love to hear how others describe
> > > the concepts.
> > >
> > > --Bevi Chagnon
> > >
> > > -----Original Message-----
> > > From: <EMAIL REMOVED>
> > > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > Holdsworth
> > > Sent: Wednesday, February 18, 2015 12:11 PM
> > > To: WebAIM Discussion List
> > > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >
> > > Thanks so much everyone for weighing in - I've found this a very
> > > useful thread indeed.
> > >
> > > One more question: in PDF docs, what's the difference between tags
> > > and structure? Ryan mentioned that the doc may include structure
> > > but not be tagged, and I don't understand the difference.
> > >
> > > And thanks Duff for the LinkedIn group suggestions. I'll join at
> > > least
> > the
> > > first one.
> > >
> > > Really hoping that Adobe is working on ironing out the
> > > accessibility glitches in the DownLoad Assistant, as I'd
> > > appreciate the chance to learn about and use what seems like a
> > > great bunch of accessibility features in Acrobat.
> > >
> > > Best, Lynn
> > >
> > > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> > >> Bim,
> > >> I was talking about both Acrobat and Reader in my reply, sorry if
> > >> that wasn't clear. It is the same process for both.
> > >> AWK
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim
> > >> Egan
> > >> Sent: Wednesday, February 18, 2015 7:13 AM
> > >> To: 'WebAIM Discussion List'
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Lynn didn't seem to be talking about using Acrobat though. She
> > >> described the experience of many screen reader users in finding a
> > >> table in an untagged PDF when opened in Reader, and she asked why
> > >> this could happen.
> Her
> > >> message said that the Acrobat installation wasn't accessible.
> > >>
> > >> Bim
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> > >> Kirkpatrick
> > >> Sent: 18 February 2015 14:36
> > >> To: WebAIM Discussion List
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Jon is correct. When Acrobat opens an untagged document and
> > >> there is a client that is using the accessibility API data
> > >> running, Acrobat (or
> > >> Reader) will add tags to the document. The result is the same as
> > >> if an author used the "add tags" feature in Acrobat. You get
> > >> Acrobat's best interpretation of what the tags should be. That
> > >> will sometimes result in headings, well-formed tables, lists, and
> other
> structures.
> > >> Authors who use this feature in Acrobat know that you generally
> > >> need to
> > > fix some of the tags.
> > >>
> > >>
> > >>
> > >> The result is that the document is tagged temporarily and
> > >> assistive technologies recognize and use the information.
> > >>
> > >>
> > >>
> > >> The dialogs that you see when opening PDF documents give you some
> > >> information about what is going on. To understand better, here's
> > >> my explanation.
> > >>
> > >>
> > >>
> > >> In acrobat or Reader preferences there is a "Reading" category.
> > >> There is a checkbox that is labeled "Confirm before tagging
> > >> documents". If this is checked, then every time that Reader
> > >> intends to tag an untagged document the "Reading an untagged
> > >> document with assistive technology" dialog pops up and the user
> > >> needs to confirm that this is what they'd like to do. If the
> > >> user selects cancel then the document won't be tagged and the
> > >> reading experience will be essentially
> > > non-existent.
> > >>
> > >>
> > >>
> > >> If you elect to allow the tagging, there are other options as
> > >> mentioned in one of the replies. I recommend using the "infer
> > >> reading order from document" option.
> > >>
> > >>
> > >>
> > >> There are other settings related to large documents and auto-tagging.
> > >> Autotagging takes time, so if you open a very dense 600 page
> > >> manual you may find that Reader takes a long time to do the
> > >> tagging. It can, and we are always looking to improve the
> > >> efficiency of this
> process.
> > >> The option for the user is to indicate whether the autotagging
> > >> should occur only on visible pages, on all pages in the document,
> > >> or on all pages except if the document is "large". The user gets
> > >> to define what large means - a user might find that their system
> > >> is slow at this so sets the limit at 25 pages, or might set it
> > >> higher if their system handles this process quickly. The down
> > >> side of only tagging a few pages at a time is that if there are
> > >> recognized structures on pages that haven't been tagged yet (e.g.
> > >> a heading on page 51) the user can't use screen reader heading
> > >> navigation to jump to it because the
> > tags
> > > don't exist until the page is in view in the reader.
> > >>
> > >>
> > >>
> > >> Hope this helps,
> > >>
> > >> AWK
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > >> Holdsworth
> > >> Sent: Wednesday, February 18, 2015 4:36 AM
> > >> To: WebAIM Discussion List
> > >> Subject: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> Apologies if PDF accessibility is off topic. If so is there a
> > >> list that covers this?
> > >>
> > >>
> > >>
> > >> But if not ...
> > >>
> > >>
> > >>
> > >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> > >>
> > >>
> > >>
> > >> So I begin to peruse it using JAWS, and come across a table whose
> > >> structure is robust enough for me to move around it using the
> > >> JAWS table
> > > keystrokes.
> > >>
> > >>
> > >>
> > >> Does this mean there *are* tags in the document after all? Or has
> > >> Adobe Reader used heuristics to add tags to improve the doc's
> > >> accessibility, since my settings flag up that I'm using a
> screenreader?
> > >>
> > >>
> > >>
> > >> I tried to download a trial version of Acrobat Pro so as to
> > >> examine the document structure, but the download assistant seems inaccessible.
> > >>
> > >>
> > >>
> > >> Thanks as always, Lynn
> > >>
> > >> > > >>
> > >> > > >> > > >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> > >> > > >> > > >> > > >>
> > >> > > >> > > >> > > >> > > >> > > >> > > >>
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > list messages to <EMAIL REMOVED>
> > > list messages to <EMAIL REMOVED>
>
> > > list messages to <EMAIL REMOVED>
>