WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Untagged PDF doc with table structure

for

From: Chagnon | PubCom
Date: Feb 18, 2015 4:25PM


Sorry, that's a different issue Ryan.
I'm talking about the jibberish tags that end up regardless of how a
designer sets up the document for tagging in InDesign. The core tag in the
exported PDF ends up being a jibberish version of the paragraph style's
name, not the designed tag.

RE: what you mentioned about InDesign's tags, that's sort of true. Let me
clarify InDesign's tagging method.
InDesign does indeed recognize all of the PDF tags; the problem is that it
and Acrobat don't export them as well as they should.

In InDesign, certain tags must be set in the export tag options:
Headings 1 through 6.
Artifacts (for text).
Well, that's all the control you have in InDesign! Everything else is set to
Auto, and Auto does recognize:
- Tables (and if you've set repeating headers, it will put in the TH
tag).
- Lists, both numbered and bulleted.
- Hyperlinks if you've used the hyperlink utility.
- TOCs if you've used InDesign's TOC utility.
- And pretty much the core of any InDesign document.

Except for grouped items, anchored text frames, un-hyperlinked footnotes,
un-hyperlinked indexes, and a whole lot more, the basics of an InDesign
document are tagged correctly in the PDF.

--Bevi Chagnon

-----Original Message-----
From: <EMAIL REMOVED>
[mailto: <EMAIL REMOVED> ] On Behalf Of Ryan E. Benson
Sent: Wednesday, February 18, 2015 5:51 PM
To: WebAIM Discussion List
Subject: Re: [WebAIM] Untagged PDF doc with table structure

InDesign only recognizes a handful of standard PDF tags. I can't find the
list right now, but I am pretty sure it is in the help. InDesign knows
<Table>, <Tr> and <Td>, for example, but not <TH> or something like that.
PDF tags are case sensitive, so if you create an h1 Tag for your inDesign
document, it gets mapped to the <P> tag in the PDF. However, creating the
H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.

--
Ryan E. Benson

On Wed, Feb 18, 2015 at 3:12 PM, Chagnon | PubCom < <EMAIL REMOVED> >
wrote:

> Agree with Andrew. Bad tags do not create a structure.
>
> Adobe InDesign is famous for creating ridiculous tags in PDFs exported
> from the layout files. Tags like <blue_subhead_with_extra_space_above>
> or <judys_inserted_copy> are some I recently saw. Acrobat erroneously
> creates the tags from the names of the designer's paragraph formatting
> styles, not from what has been programmed to be <h1> or <h2>. So
> Acrobat's Role Map utility has to reinterpret those wild and crazy
> tags into normal <h1>, <h2> etc tags.
>
> A lot depends on 4 things:
>
> 1) The software version in which the software was created. MS Word
> 2013 tags things more correctly than Word 2010 or 2007. Same with
> Adobe InDesign.
> Always use the most recent version to create the source documents.
> Standards
> change, as well as the tools we use to create to those standards, so
> the latest software version will always give the best results and,
> hopefully, builds documents to the latest standards. As an example,
> look how the tagging of lists has changed over the past 10 years.
>
> 2) The software version of Acrobat that was used to create the PDF. In
> MS Word, for example, when we select File / Save as PDF, we're
> invoking an Acrobat module (or plug-in) in Word that interprets the
> Word document to create the PDF. Which version of Acrobat did the
> conversion? Acrobat 11 does a better job than 10 which does a better
> job than 9. FYI, you can see the versions of Acrobat and the source
> program in the PDF's File / Properties utility. Also, some people use
> non-Adobe PDF makers, which from my experience don't make accessible
> PDFs at all.
>
> 3) The conversion settings (or preferences) when the PDF was exported
> from the source document. Miss a few checkboxes in the settings and
> you won't get an accessible PDF.
>
> 4) The skill of the person who created the source document and the
> PDF. If they don't know how to use Word's footnote utility and instead
> insert them by hand, then the PDF's footnotes won't be fully
> accessible. If they're a novice user of Adobe InDesign, forget it! The
> file will be a inaccessible nightmare!
>
> --Bevi Chagnon
>
>
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> Kirkpatrick
> Sent: Wednesday, February 18, 2015 2:36 PM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Tagged PDF just means that it has tags at all. There is no guarantee
> that they are correct. This is where PDF/UA helps in that it answers
> the important question "tagged how?"
> AWK
>
> -----Original Message-----
> From: <EMAIL REMOVED>
> [mailto: <EMAIL REMOVED> ] On Behalf Of Brian
> Richwine
> Sent: Wednesday, February 18, 2015 11:20 AM
> To: WebAIM Discussion List
> Subject: Re: [WebAIM] Untagged PDF doc with table structure
>
> Hi,
>
> I've heard accessibility professionals state that a "tagged pdf" is
> one that uses a standard set of tags. So, some PDFs can have
> structure, but be using a non-standard set of tags and thus assistive
> technologies will not know how to interpret the tags that are in the
> document. So I came away thinking that a PDF that has tags is a
> structured PDF, and even better (in terms of
> accessibility) a PDF that uses standardized tags has structure and is
> "tagged".
>
> -Brian
>
> On Wed, Feb 18, 2015 at 1:45 PM, Lynn Holdsworth <
> <EMAIL REMOVED> >
> wrote:
>
> > Hi Bevi,
> >
> > Thanks for taking the time to write such a comprehensive response.
> >
> > From creating HTML pages for about half a lifetime, I'd define tags
> > and structure pretty much the way you do.
> >
> > But I inferred from this thread, and from talking with someone who
> > knows a lot more about PDF than I do, that it's possible to have
> > structure without tags in a PDF document. Is this correct, and if so
> > how would I recognise it if I were to examine the document's
> > building blocks?
> >
> > Best, Lynn
> >
> > On 18/02/2015, Chagnon | PubCom < <EMAIL REMOVED> > wrote:
> > > Lynn wrote: " in PDF docs, what's the difference between tags and
> > > structure?
> > > "
> > >
> > > This is one of the toughest concepts we teachers have to explain!
> > > I'd
> > love
> > > to hear how others describe it. Here's my take:
> > >
> > > Tags are labels. Code labels, specifically, that are read by
> > > Assistive Technologies and are not usually visible to sighted
> > > users unless they
> > have
> > > Acrobat Pro. They let AT users know what's a heading 2, a list of
> > bullets,
> > > tables, and other parts of the documents. Tags also do a lot of
> > > work for us, such as assisting us in creating bookmarks and tables
> > > of contents,
> > creating
> > > navigation systems, and holding the Alt-text on graphics (Alt-Text
> > > is an attribute on the figure tag and doesn't stand alone on its own).
> > >
> > > Structure is the sequence of how the document's pieces will be
> > > read, or
> > in
> > > other words, the sequence in which the tagged items are read. Call
> > > it reading order or tag reading order. The structure of some
> > > documents can also have nesting qualities, such as all the pieces
> > > of a chapter, and all the chapters in a book.
> > >
> > > An example: If Heading 1 designates a chapter title, then all the
> > > paragraph, bullets, tables, and heading 2 items within that
> > > chapter will be nested inside the main heading 1 tag. This allows
> > > AT software to figure out, hopefully, what goes with what; that
> > > all the tags nested within Heading 1 is a chapter.
> > >
> > > Structure is created when you have tags (the right tag labels) and
> > > a reading order (a logical reading order). It is possible that a
> > > tagged and structured document might not be fully accessible
> > > because the tags aren't accurate enough or the reading order is
> > > out of whack.
> > >
> > > Example number 1: In older versions of MS Word, figures would be
> > > placed
> > in
> > > very odd places of the reading order when it was exported to a PDF.
> > > If paragraph 1 stated "see figure 5", figure 5 itself might end up
> > > at the
> > very
> > > end of the reading order, not near paragraph 1 where it was
> > > referenced. A sighted person sees figure 5 next to the paragraph,
> > > but a screen reader user doesn't hear it voiced until the last
> > > page, and maybe that's page 360 of
> > a
> > > long government document. So the document is tagged and
> > > structured, but it's a faulty structure because the reading order is
incorrect.
> > >
> > > Example number 2: Graphic designers who use desktop publishing
> > > programs like Adobe InDesign and QuarkXpress create very complex
> > > visual layouts.
> > > Visually,
> > > things aren't designed in a traditional top down left right
> > > pattern but instead could be scattered all over the physical page.
> > > Here's an example
> > of
> > > a 2-page magazine spread:
> > >
> > http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Desig
> > n_
> > 1_by_B
> > > reakTheRecords.jpg (This is just a random sample I pulled up on
> > > the Internet, so it is only a graphic of a 2-page spread, no live
> > > text or
> > > Alt-text.)
> > >
> > > Note that article title (or heading 1) appears on page 2, and the
> > > body
> > text
> > > of the story starts on page 1. Backwards! And then there are 2
> > > quotes at the top of page 1, so obviously the designer wants us to
> > > read those at the beginning of the story, also. And here's a
> > > similar
> > > example:
> > >
> > https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b29
> > 49
> > 607393
> > > d8b1f24ad96.jpg
> > >
> > > Whew! Getting a tagged, logical reading order from this type of
> > > publication isn't easy!
> > >
> > > Summary:
> > > Structure equals tagged content placed in a logical reading order.
> > >
> > > Well, that's my attempt. Would love to hear how others describe
> > > the concepts.
> > >
> > > --Bevi Chagnon
> > >
> > > -----Original Message-----
> > > From: <EMAIL REMOVED>
> > > [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > Holdsworth
> > > Sent: Wednesday, February 18, 2015 12:11 PM
> > > To: WebAIM Discussion List
> > > Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >
> > > Thanks so much everyone for weighing in - I've found this a very
> > > useful thread indeed.
> > >
> > > One more question: in PDF docs, what's the difference between tags
> > > and structure? Ryan mentioned that the doc may include structure
> > > but not be tagged, and I don't understand the difference.
> > >
> > > And thanks Duff for the LinkedIn group suggestions. I'll join at
> > > least
> > the
> > > first one.
> > >
> > > Really hoping that Adobe is working on ironing out the
> > > accessibility glitches in the DownLoad Assistant, as I'd
> > > appreciate the chance to learn about and use what seems like a
> > > great bunch of accessibility features in Acrobat.
> > >
> > > Best, Lynn
> > >
> > > On 18/02/2015, Andrew Kirkpatrick < <EMAIL REMOVED> > wrote:
> > >> Bim,
> > >> I was talking about both Acrobat and Reader in my reply, sorry if
> > >> that wasn't clear. It is the same process for both.
> > >> AWK
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Bim
> > >> Egan
> > >> Sent: Wednesday, February 18, 2015 7:13 AM
> > >> To: 'WebAIM Discussion List'
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Lynn didn't seem to be talking about using Acrobat though. She
> > >> described the experience of many screen reader users in finding a
> > >> table in an untagged PDF when opened in Reader, and she asked why
> > >> this could happen.
> Her
> > >> message said that the Acrobat installation wasn't accessible.
> > >>
> > >> Bim
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Andrew
> > >> Kirkpatrick
> > >> Sent: 18 February 2015 14:36
> > >> To: WebAIM Discussion List
> > >> Subject: Re: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >> Jon is correct. When Acrobat opens an untagged document and
> > >> there is a client that is using the accessibility API data
> > >> running, Acrobat (or
> > >> Reader) will add tags to the document. The result is the same as
> > >> if an author used the "add tags" feature in Acrobat. You get
> > >> Acrobat's best interpretation of what the tags should be. That
> > >> will sometimes result in headings, well-formed tables, lists, and
> other
> structures.
> > >> Authors who use this feature in Acrobat know that you generally
> > >> need to
> > > fix some of the tags.
> > >>
> > >>
> > >>
> > >> The result is that the document is tagged temporarily and
> > >> assistive technologies recognize and use the information.
> > >>
> > >>
> > >>
> > >> The dialogs that you see when opening PDF documents give you some
> > >> information about what is going on. To understand better, here's
> > >> my explanation.
> > >>
> > >>
> > >>
> > >> In acrobat or Reader preferences there is a "Reading" category.
> > >> There is a checkbox that is labeled "Confirm before tagging
> > >> documents". If this is checked, then every time that Reader
> > >> intends to tag an untagged document the "Reading an untagged
> > >> document with assistive technology" dialog pops up and the user
> > >> needs to confirm that this is what they'd like to do. If the
> > >> user selects cancel then the document won't be tagged and the
> > >> reading experience will be essentially
> > > non-existent.
> > >>
> > >>
> > >>
> > >> If you elect to allow the tagging, there are other options as
> > >> mentioned in one of the replies. I recommend using the "infer
> > >> reading order from document" option.
> > >>
> > >>
> > >>
> > >> There are other settings related to large documents and auto-tagging.
> > >> Autotagging takes time, so if you open a very dense 600 page
> > >> manual you may find that Reader takes a long time to do the
> > >> tagging. It can, and we are always looking to improve the
> > >> efficiency of this
> process.
> > >> The option for the user is to indicate whether the autotagging
> > >> should occur only on visible pages, on all pages in the document,
> > >> or on all pages except if the document is "large". The user gets
> > >> to define what large means - a user might find that their system
> > >> is slow at this so sets the limit at 25 pages, or might set it
> > >> higher if their system handles this process quickly. The down
> > >> side of only tagging a few pages at a time is that if there are
> > >> recognized structures on pages that haven't been tagged yet (e.g.
> > >> a heading on page 51) the user can't use screen reader heading
> > >> navigation to jump to it because the
> > tags
> > > don't exist until the page is in view in the reader.
> > >>
> > >>
> > >>
> > >> Hope this helps,
> > >>
> > >> AWK
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: <EMAIL REMOVED>
> > >> [mailto: <EMAIL REMOVED> ] On Behalf Of Lynn
> > >> Holdsworth
> > >> Sent: Wednesday, February 18, 2015 4:36 AM
> > >> To: WebAIM Discussion List
> > >> Subject: [WebAIM] Untagged PDF doc with table structure
> > >>
> > >>
> > >>
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> Apologies if PDF accessibility is off topic. If so is there a
> > >> list that covers this?
> > >>
> > >>
> > >>
> > >> But if not ...
> > >>
> > >>
> > >>
> > >> I open a PDF document, and Adobe Reader alerts me that it's untagged.
> > >>
> > >>
> > >>
> > >> So I begin to peruse it using JAWS, and come across a table whose
> > >> structure is robust enough for me to move around it using the
> > >> JAWS table
> > > keystrokes.
> > >>
> > >>
> > >>
> > >> Does this mean there *are* tags in the document after all? Or has
> > >> Adobe Reader used heuristics to add tags to improve the doc's
> > >> accessibility, since my settings flag up that I'm using a
> screenreader?
> > >>
> > >>
> > >>
> > >> I tried to download a trial version of Acrobat Pro so as to
> > >> examine the document structure, but the download assistant seems
inaccessible.
> > >>
> > >>
> > >>
> > >> Thanks as always, Lynn
> > >>
> > >> > > >>
> > >> > > >> > > >> <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >
> > >> > > >> > > >> > > >>
> > >> > > >> > > >> > > >> > > >> > > >> > > >>
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > > > > list messages to <EMAIL REMOVED>
> > >
> > > > > > list messages to <EMAIL REMOVED>
> >
> > > list messages to <EMAIL REMOVED>
> > > list messages to <EMAIL REMOVED>
>
> > > list messages to <EMAIL REMOVED>
>
messages to <EMAIL REMOVED>