WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Untagged PDF doc with table structure

for

Number of posts in this thread: 42 (In chronological order)

From: Lynn Holdsworth
Date: Wed, Feb 18 2015 5:35AM
Subject: Untagged PDF doc with table structure
No previous message | Next message →

Hi all,

Apologies if PDF accessibility is off topic. If so is there a list
that covers this?

But if not ...

I open a PDF document, and Adobe Reader alerts me that it's untagged.

So I begin to peruse it using JAWS, and come across a table whose
structure is robust enough for me to move around it using the JAWS
table keystrokes.

Does this mean there *are* tags in the document after all? Or has
Adobe Reader used heuristics to add tags to improve the doc's
accessibility, since my settings flag up that I'm using a
screenreader?

I tried to download a trial version of Acrobat Pro so as to examine
the document structure, but the download assistant seems inaccessible.

Thanks as always, Lynn

From: Jonathan Avila
Date: Wed, Feb 18 2015 5:48AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

It is my experience too that Adobe Reader can add a temporary set of tags to untagged document

Acrobat Pro can also apply OCR.

Jon

> On Feb 18, 2015, at 7:37 AM, Lynn Holdsworth < = EMAIL ADDRESS REMOVED = > wrote:
>
> Hi all,
>
> Apologies if PDF accessibility is off topic. If so is there a list
> that covers this?
>
> But if not ...
>
> I open a PDF document, and Adobe Reader alerts me that it's untagged.
>
> So I begin to peruse it using JAWS, and come across a table whose
> structure is robust enough for me to move around it using the JAWS
> table keystrokes.
>
> Does this mean there *are* tags in the document after all? Or has
> Adobe Reader used heuristics to add tags to improve the doc's
> accessibility, since my settings flag up that I'm using a
> screenreader?
>
> I tried to download a trial version of Acrobat Pro so as to examine
> the document structure, but the download assistant seems inaccessible.
>
> Thanks as always, Lynn
> > >

From: Lynn Holdsworth
Date: Wed, Feb 18 2015 5:58AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Thanks for the confirmation Jonathan. This is pretty impressive stuff.
I presume that if I were to examine the doc in Acrobat pro, it would
present with no tags at all. Frustratingly, I'll need to wait until
someone's available to help me download and install it before I can
find out. Really hoping the Acrobat accessibility interface is itself
accessible.

Best, Lynn

On 18/02/2015, Jonathan Avila < = EMAIL ADDRESS REMOVED = > wrote:
> It is my experience too that Adobe Reader can add a temporary set of tags to
> untagged document
>
> Acrobat Pro can also apply OCR.
>
> Jon
>
>> On Feb 18, 2015, at 7:37 AM, Lynn Holdsworth < = EMAIL ADDRESS REMOVED = >
>> wrote:
>>
>> Hi all,
>>
>> Apologies if PDF accessibility is off topic. If so is there a list
>> that covers this?
>>
>> But if not ...
>>
>> I open a PDF document, and Adobe Reader alerts me that it's untagged.
>>
>> So I begin to peruse it using JAWS, and come across a table whose
>> structure is robust enough for me to move around it using the JAWS
>> table keystrokes.
>>
>> Does this mean there *are* tags in the document after all? Or has
>> Adobe Reader used heuristics to add tags to improve the doc's
>> accessibility, since my settings flag up that I'm using a
>> screenreader?
>>
>> I tried to download a trial version of Acrobat Pro so as to examine
>> the document structure, but the download assistant seems inaccessible.
>>
>> Thanks as always, Lynn
>> >> >> > > > >

From: Michael Bullis
Date: Wed, Feb 18 2015 6:06AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

I would add something you probably already know.
When you open a pdf, Adobe often only shows you three choices--infer reading
order from document, left to right, and use reading order in raw print
stream. The document may be tagged but you have to escape from the initial
choices and re-enter through the menu choice of change reading options.
Then it will often show you the tagged choice. I'm not clear why this is,
but it happens often enough that I pretty much always escape from the
initial screen and then after jaws says no document available, I go to the
edit menu and check for accessibility options. Very often, the tagged
option now appears.

From: Karlen Communications
Date: Wed, Feb 18 2015 7:09AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Tags are not added to the document when you open an untagged PDF and use
this option. What happens is that "Virtual" Tags are added based on the fact
that you are using a "trusted assistive technology" recognized by Acrobat or
Reader. This allows temporary access to the content and the Tags are not
saved when the document is closed or if you choose to save the document
under a different name.

I just tested this in Acrobat with an untagged PDF. This is a stop-gap tool
to allow access to legacy PDF documents but since there are still many
untagged PDF being produced, it is part of our daily toolbox.

If you are going to add actual Tags to a PDF, dismiss this message before
you use the Add Tags tool as I find it tends to confuse the real tagging
process and I get some interesting versions of the Add Tags Report.

Cheers, Karen

From: Harrison, Rita L
Date: Wed, Feb 18 2015 7:18AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

When you see this list, you can simply tab a few times to the "start" button and enter on it. Your document will process. If it's an image, you can also OCR it by tabbing through the options and entering on the OK button.

I hope this helps.


Rita L. Harrison, FDA 508 Coordinator
Lead, 508 Web Task Force
Chairperson, Advisory Committee for Employees with Disabilities (ACED)
OO/OIMT/DBPS/IIB
Web Support Team (WST)
Phone: 805-285-0639
= EMAIL ADDRESS REMOVED =

From: Ryan E. Benson
Date: Wed, Feb 18 2015 7:30AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

I am not 100% sure that is true Karen. There is an additional pane in
Acrobat called the Contents pane. I consider this to hold the guts of the
PDF, or the content. It sounds like whatever software was used to make the
PDF used structure, but didn't create tags. So if Lynn opened up that pane,
I bet she'd see Container <Table> ....., and so on, then JAWS is making the
connection you mentioned, which appears to "read normally." If we then use
"add tags" option, the added tags should roughly mimic the structure from
the contents pane.

Michael said
> The document may be tagged but you have to escape from the initial
choices and re-enter through the menu choice of change reading options.
Then it will often show you the tagged choice. I'm not clear why this is,
but it happens often enough that I pretty much always escape from the
initial screen

There is an additional setting that needs to be set. You must open the
pages pane, select all pages, choose properties. By default this is set to
unstructured, this is like saying "guess what to do." The other options are
like use tags, use the document language to pick (left to right), and one
more that i am drawing a blank on


--
Ryan E. Benson

On Wed, Feb 18, 2015 at 9:09 AM, Karlen Communications <
= EMAIL ADDRESS REMOVED = > wrote:

> Tags are not added to the document when you open an untagged PDF and use
> this option. What happens is that "Virtual" Tags are added based on the
> fact
> that you are using a "trusted assistive technology" recognized by Acrobat
> or
> Reader. This allows temporary access to the content and the Tags are not
> saved when the document is closed or if you choose to save the document
> under a different name.
>
> I just tested this in Acrobat with an untagged PDF. This is a stop-gap tool
> to allow access to legacy PDF documents but since there are still many
> untagged PDF being produced, it is part of our daily toolbox.
>
> If you are going to add actual Tags to a PDF, dismiss this message before
> you use the Add Tags tool as I find it tends to confuse the real tagging
> process and I get some interesting versions of the Add Tags Report.
>
> Cheers, Karen
>
>

From: Andrew Kirkpatrick
Date: Wed, Feb 18 2015 7:36AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Jon is correct. When Acrobat opens an untagged document and there is a client that is using the accessibility API data running, Acrobat (or Reader) will add tags to the document. The result is the same as if an author used the "add tags" feature in Acrobat. You get Acrobat's best interpretation of what the tags should be. That will sometimes result in headings, well-formed tables, lists, and other structures. Authors who use this feature in Acrobat know that you generally need to fix some of the tags.



The result is that the document is tagged temporarily and assistive technologies recognize and use the information.



The dialogs that you see when opening PDF documents give you some information about what is going on. To understand better, here's my explanation.



In acrobat or Reader preferences there is a "Reading" category. There is a checkbox that is labeled "Confirm before tagging documents". If this is checked, then every time that Reader intends to tag an untagged document the "Reading an untagged document with assistive technology" dialog pops up and the user needs to confirm that this is what they'd like to do. If the user selects cancel then the document won't be tagged and the reading experience will be essentially non-existent.



If you elect to allow the tagging, there are other options as mentioned in one of the replies. I recommend using the "infer reading order from document" option.



There are other settings related to large documents and auto-tagging. Autotagging takes time, so if you open a very dense 600 page manual you may find that Reader takes a long time to do the tagging. It can, and we are always looking to improve the efficiency of this process. The option for the user is to indicate whether the autotagging should occur only on visible pages, on all pages in the document, or on all pages except if the document is "large". The user gets to define what large means - a user might find that their system is slow at this so sets the limit at 25 pages, or might set it higher if their system handles this process quickly. The down side of only tagging a few pages at a time is that if there are recognized structures on pages that haven't been tagged yet (e.g. a heading on page 51) the user can't use screen reader heading navigation to jump to it because the tags don't exist until the page is in view in the reader.



Hope this helps,

AWK



From: Jonathan Avila
Date: Wed, Feb 18 2015 8:01AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

The temporary or virtual tags as Katen calls them IMO are not visible from the tags tree and can't be saved. I agree with Karen that if you want to use the add tags feature in Acrobat you should start without the temporary or virtual tags

Jon


> On Feb 18, 2015, at 7:59 AM, Lynn Holdsworth < = EMAIL ADDRESS REMOVED = > wrote:
>
> Thanks for the confirmation Jonathan. This is pretty impressive stuff.
> I presume that if I were to examine the doc in Acrobat pro, it would
> present with no tags at all. Frustratingly, I'll need to wait until
> someone's available to help me download and install it before I can
> find out. Really hoping the Acrobat accessibility interface is itself
> accessible.
>
> Best, Lynn
>
>> On 18/02/2015, Jonathan Avila < = EMAIL ADDRESS REMOVED = > wrote:
>> It is my experience too that Adobe Reader can add a temporary set of tags to
>> untagged document
>>
>> Acrobat Pro can also apply OCR.
>>
>> Jon
>>
>>> On Feb 18, 2015, at 7:37 AM, Lynn Holdsworth < = EMAIL ADDRESS REMOVED = >
>>> wrote:
>>>
>>> Hi all,
>>>
>>> Apologies if PDF accessibility is off topic. If so is there a list
>>> that covers this?
>>>
>>> But if not ...
>>>
>>> I open a PDF document, and Adobe Reader alerts me that it's untagged.
>>>
>>> So I begin to peruse it using JAWS, and come across a table whose
>>> structure is robust enough for me to move around it using the JAWS
>>> table keystrokes.
>>>
>>> Does this mean there *are* tags in the document after all? Or has
>>> Adobe Reader used heuristics to add tags to improve the doc's
>>> accessibility, since my settings flag up that I'm using a
>>> screenreader?
>>>
>>> I tried to download a trial version of Acrobat Pro so as to examine
>>> the document structure, but the download assistant seems inaccessible.
>>>
>>> Thanks as always, Lynn
>>> >>> >>> >> >> >> > > >

From: Duff Johnson
Date: Wed, Feb 18 2015 8:12AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

> Apologies if PDF accessibility is off topic. If so is there a list
> that covers this?

Although PDF accessibility is not "off topic" (PDF is a web format (much as webbies hate to admit it :-), this is not necessarily the best list for extended discussions on details of PDF accessibility or PDF applications.

Here are some related resources.

On LinkedIn...

"About PDF/UA" - a group operated by the PDF Association to educate specifically on the ISO standard for accessible PDF:

https://www.linkedin.com/groups/About-PDF-UA-accessibility-7470079/about

"Accessible PDF" - a more general group for questions and discussions related to PDF accessibility (not just attending to the ISO standard and supporting documentation)

https://www.linkedin.com/groups/Accessible-PDF-4220500/about

For developers...

For members only, the PDF Association operates the "PDF/UA Competence Center", a technically-oriented group that includes 15+ web-meetings / year and runs an active discussion list. The PDF/UA Competence Center is the body responsible for publishing the Matterhorn Protocol and the PDF/UA Reference Suite.

http://www.pdfa.org/competence-centers/pdfua-competence-center/

I hope these resources are useful.

Full disclosure: I'm the Executive Director of the PDF Association. I am also the "manager" of both of the above-mentioned LinkedIn groups.

Duff.

From: Bim Egan
Date: Wed, Feb 18 2015 8:12AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Lynn didn't seem to be talking about using Acrobat though. She described
the experience of many screen reader users in finding a table in an untagged
PDF when opened in Reader, and she asked why this could happen. Her
message said that the Acrobat installation wasn't accessible.

Bim

From: Jonathan C Cohn
Date: Wed, Feb 18 2015 8:41AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Andrew,

Is the only reason to not auto-tag the entire document a time initial issue? I generally have based my decision on the amount of RAM available on my system. If the document is quite large, and one has is already using most of the active RAM, will tagging a 100+ page document cause slow page changes because of swapping?
Thanks,

> On Feb 18, 2015, at 09:36, Andrew Kirkpatrick < = EMAIL ADDRESS REMOVED = > wrote:
>
> Jon is correct. When Acrobat opens an untagged document and there is a client that is using the accessibility API data running, Acrobat (or Reader) will add tags to the document. The result is the same as if an author used the "add tags" feature in Acrobat. You get Acrobat's best interpretation of what the tags should be. That will sometimes result in headings, well-formed tables, lists, and other structures. Authors who use this feature in Acrobat know that you generally need to fix some of the tags.
>
>
>
> The result is that the document is tagged temporarily and assistive technologies recognize and use the information.
>
>
>
> The dialogs that you see when opening PDF documents give you some information about what is going on. To understand better, here's my explanation.
>
>
>
> In acrobat or Reader preferences there is a "Reading" category. There is a checkbox that is labeled "Confirm before tagging documents". If this is checked, then every time that Reader intends to tag an untagged document the "Reading an untagged document with assistive technology" dialog pops up and the user needs to confirm that this is what they'd like to do. If the user selects cancel then the document won't be tagged and the reading experience will be essentially non-existent.
>
>
>
> If you elect to allow the tagging, there are other options as mentioned in one of the replies. I recommend using the "infer reading order from document" option.
>
>
>
> There are other settings related to large documents and auto-tagging. Autotagging takes time, so if you open a very dense 600 page manual you may find that Reader takes a long time to do the tagging. It can, and we are always looking to improve the efficiency of this process. The option for the user is to indicate whether the autotagging should occur only on visible pages, on all pages in the document, or on all pages except if the document is "large". The user gets to define what large means - a user might find that their system is slow at this so sets the limit at 25 pages, or might set it higher if their system handles this process quickly. The down side of only tagging a few pages at a time is that if there are recognized structures on pages that haven't been tagged yet (e.g. a heading on page 51) the user can't use screen reader heading navigation to jump to it because the tags don't exist until the page is in view in the reader.
>
>
>
> Hope this helps,
>
> AWK
>
>
>
>

From: Jonathan Avila
Date: Wed, Feb 18 2015 8:45AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

> Is the only reason to not auto-tag the entire document a time initial issue?

It is a time issue. Why wait minutes for the whole document to bet tagged and have access blocked to every page of the document when you can have access on a page by page basis immediately.

Jonathan

--
Jonathan Avila
Chief Accessibility Officer
SSB BART Group
= EMAIL ADDRESS REMOVED =

703-637-8957 (o)
Follow us: Facebook | Twitter | LinkedIn | Blog | Newsletter


From: Andrew Kirkpatrick
Date: Wed, Feb 18 2015 8:47AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

I'm not sure. My impression from discussions on this is that the big hit comes from the tagging process rather than storing the existing tags. If you don't have issues with documents of the same length that are tagged already then I don't think that you would with documents after the autotagging is complete.
AWK

From: Andrew Kirkpatrick
Date: Wed, Feb 18 2015 8:49AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Bim,
I was talking about both Acrobat and Reader in my reply, sorry if that wasn't clear. It is the same process for both.
AWK

From: Lynn Holdsworth
Date: Wed, Feb 18 2015 10:10AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Thanks so much everyone for weighing in - I've found this a very
useful thread indeed.

One more question: in PDF docs, what's the difference between tags and
structure? Ryan mentioned that the doc may include structure but not
be tagged, and I don't understand the difference.

And thanks Duff for the LinkedIn group suggestions. I'll join at least
the first one.

Really hoping that Adobe is working on ironing out the accessibility
glitches in the DownLoad Assistant, as I'd appreciate the chance to
learn about and use what seems like a great bunch of accessibility
features in Acrobat.

Best, Lynn

On 18/02/2015, Andrew Kirkpatrick < = EMAIL ADDRESS REMOVED = > wrote:
> Bim,
> I was talking about both Acrobat and Reader in my reply, sorry if that
> wasn't clear. It is the same process for both.
> AWK
>
>

From: Chagnon | PubCom
Date: Wed, Feb 18 2015 11:17AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Lynn wrote: " in PDF docs, what's the difference between tags and structure?
"

This is one of the toughest concepts we teachers have to explain! I'd love
to hear how others describe it. Here's my take:

Tags are labels. Code labels, specifically, that are read by Assistive
Technologies and are not usually visible to sighted users unless they have
Acrobat Pro. They let AT users know what's a heading 2, a list of bullets,
tables, and other parts of the documents. Tags also do a lot of work for us,
such as assisting us in creating bookmarks and tables of contents, creating
navigation systems, and holding the Alt-text on graphics (Alt-Text is an
attribute on the figure tag and doesn't stand alone on its own).

Structure is the sequence of how the document's pieces will be read, or in
other words, the sequence in which the tagged items are read. Call it
reading order or tag reading order. The structure of some documents can also
have nesting qualities, such as all the pieces of a chapter, and all the
chapters in a book.

An example: If Heading 1 designates a chapter title, then all the paragraph,
bullets, tables, and heading 2 items within that chapter will be nested
inside the main heading 1 tag. This allows AT software to figure out,
hopefully, what goes with what; that all the tags nested within Heading 1 is
a chapter.

Structure is created when you have tags (the right tag labels) and a reading
order (a logical reading order). It is possible that a tagged and structured
document might not be fully accessible because the tags aren't accurate
enough or the reading order is out of whack.

Example number 1: In older versions of MS Word, figures would be placed in
very odd places of the reading order when it was exported to a PDF. If
paragraph 1 stated "see figure 5", figure 5 itself might end up at the very
end of the reading order, not near paragraph 1 where it was referenced. A
sighted person sees figure 5 next to the paragraph, but a screen reader user
doesn't hear it voiced until the last page, and maybe that's page 360 of a
long government document. So the document is tagged and structured, but it's
a faulty structure because the reading order is incorrect.

Example number 2: Graphic designers who use desktop publishing programs like
Adobe InDesign and QuarkXpress create very complex visual layouts. Visually,
things aren't designed in a traditional top down left right pattern but
instead could be scattered all over the physical page. Here's an example of
a 2-page magazine spread:
http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_1_by_B
reakTheRecords.jpg (This is just a random sample I pulled up on the
Internet, so it is only a graphic of a 2-page spread, no live text or
Alt-text.)

Note that article title (or heading 1) appears on page 2, and the body text
of the story starts on page 1. Backwards! And then there are 2 quotes at the
top of page 1, so obviously the designer wants us to read those at the
beginning of the story, also. And here's a similar example:
https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949607393
d8b1f24ad96.jpg

Whew! Getting a tagged, logical reading order from this type of publication
isn't easy!

Summary:
Structure equals tagged content placed in a logical reading order.

Well, that's my attempt. Would love to hear how others describe the
concepts.

--Bevi Chagnon

From: Lynn Holdsworth
Date: Wed, Feb 18 2015 11:45AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Hi Bevi,

Thanks for taking the time to write such a comprehensive response.

From creating HTML pages for about half a lifetime, I'd define tags
and structure pretty much the way you do.

But I inferred from this thread, and from talking with someone who
knows a lot more about PDF than I do, that it's possible to have
structure without tags in a PDF document. Is this correct, and if so
how would I recognise it if I were to examine the document's building
blocks?

Best, Lynn

On 18/02/2015, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = > wrote:
> Lynn wrote: " in PDF docs, what's the difference between tags and
> structure?
> "
>
> This is one of the toughest concepts we teachers have to explain! I'd love
> to hear how others describe it. Here's my take:
>
> Tags are labels. Code labels, specifically, that are read by Assistive
> Technologies and are not usually visible to sighted users unless they have
> Acrobat Pro. They let AT users know what's a heading 2, a list of bullets,
> tables, and other parts of the documents. Tags also do a lot of work for
> us,
> such as assisting us in creating bookmarks and tables of contents, creating
> navigation systems, and holding the Alt-text on graphics (Alt-Text is an
> attribute on the figure tag and doesn't stand alone on its own).
>
> Structure is the sequence of how the document's pieces will be read, or in
> other words, the sequence in which the tagged items are read. Call it
> reading order or tag reading order. The structure of some documents can
> also
> have nesting qualities, such as all the pieces of a chapter, and all the
> chapters in a book.
>
> An example: If Heading 1 designates a chapter title, then all the
> paragraph,
> bullets, tables, and heading 2 items within that chapter will be nested
> inside the main heading 1 tag. This allows AT software to figure out,
> hopefully, what goes with what; that all the tags nested within Heading 1
> is
> a chapter.
>
> Structure is created when you have tags (the right tag labels) and a
> reading
> order (a logical reading order). It is possible that a tagged and
> structured
> document might not be fully accessible because the tags aren't accurate
> enough or the reading order is out of whack.
>
> Example number 1: In older versions of MS Word, figures would be placed in
> very odd places of the reading order when it was exported to a PDF. If
> paragraph 1 stated "see figure 5", figure 5 itself might end up at the very
> end of the reading order, not near paragraph 1 where it was referenced. A
> sighted person sees figure 5 next to the paragraph, but a screen reader
> user
> doesn't hear it voiced until the last page, and maybe that's page 360 of a
> long government document. So the document is tagged and structured, but
> it's
> a faulty structure because the reading order is incorrect.
>
> Example number 2: Graphic designers who use desktop publishing programs
> like
> Adobe InDesign and QuarkXpress create very complex visual layouts.
> Visually,
> things aren't designed in a traditional top down left right pattern but
> instead could be scattered all over the physical page. Here's an example of
> a 2-page magazine spread:
> http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_1_by_B
> reakTheRecords.jpg (This is just a random sample I pulled up on the
> Internet, so it is only a graphic of a 2-page spread, no live text or
> Alt-text.)
>
> Note that article title (or heading 1) appears on page 2, and the body text
> of the story starts on page 1. Backwards! And then there are 2 quotes at
> the
> top of page 1, so obviously the designer wants us to read those at the
> beginning of the story, also. And here's a similar example:
> https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949607393
> d8b1f24ad96.jpg
>
> Whew! Getting a tagged, logical reading order from this type of
> publication
> isn't easy!
>
> Summary:
> Structure equals tagged content placed in a logical reading order.
>
> Well, that's my attempt. Would love to hear how others describe the
> concepts.
>
> --Bevi Chagnon
>
>

From: Brian Richwine
Date: Wed, Feb 18 2015 12:19PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Hi,

I've heard accessibility professionals state that a "tagged pdf" is one
that uses a standard set of tags. So, some PDFs can have structure, but be
using a non-standard set of tags and thus assistive technologies will not
know how to interpret the tags that are in the document. So I came away
thinking that a PDF that has tags is a structured PDF, and even better (in
terms of accessibility) a PDF that uses standardized tags has structure and
is "tagged".

-Brian

On Wed, Feb 18, 2015 at 1:45 PM, Lynn Holdsworth < = EMAIL ADDRESS REMOVED = >
wrote:

> Hi Bevi,
>
> Thanks for taking the time to write such a comprehensive response.
>
> From creating HTML pages for about half a lifetime, I'd define tags
> and structure pretty much the way you do.
>
> But I inferred from this thread, and from talking with someone who
> knows a lot more about PDF than I do, that it's possible to have
> structure without tags in a PDF document. Is this correct, and if so
> how would I recognise it if I were to examine the document's building
> blocks?
>
> Best, Lynn
>
> On 18/02/2015, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = > wrote:
> > Lynn wrote: " in PDF docs, what's the difference between tags and
> > structure?
> > "
> >
> > This is one of the toughest concepts we teachers have to explain! I'd
> love
> > to hear how others describe it. Here's my take:
> >
> > Tags are labels. Code labels, specifically, that are read by Assistive
> > Technologies and are not usually visible to sighted users unless they
> have
> > Acrobat Pro. They let AT users know what's a heading 2, a list of
> bullets,
> > tables, and other parts of the documents. Tags also do a lot of work for
> > us,
> > such as assisting us in creating bookmarks and tables of contents,
> creating
> > navigation systems, and holding the Alt-text on graphics (Alt-Text is an
> > attribute on the figure tag and doesn't stand alone on its own).
> >
> > Structure is the sequence of how the document's pieces will be read, or
> in
> > other words, the sequence in which the tagged items are read. Call it
> > reading order or tag reading order. The structure of some documents can
> > also
> > have nesting qualities, such as all the pieces of a chapter, and all the
> > chapters in a book.
> >
> > An example: If Heading 1 designates a chapter title, then all the
> > paragraph,
> > bullets, tables, and heading 2 items within that chapter will be nested
> > inside the main heading 1 tag. This allows AT software to figure out,
> > hopefully, what goes with what; that all the tags nested within Heading 1
> > is
> > a chapter.
> >
> > Structure is created when you have tags (the right tag labels) and a
> > reading
> > order (a logical reading order). It is possible that a tagged and
> > structured
> > document might not be fully accessible because the tags aren't accurate
> > enough or the reading order is out of whack.
> >
> > Example number 1: In older versions of MS Word, figures would be placed
> in
> > very odd places of the reading order when it was exported to a PDF. If
> > paragraph 1 stated "see figure 5", figure 5 itself might end up at the
> very
> > end of the reading order, not near paragraph 1 where it was referenced. A
> > sighted person sees figure 5 next to the paragraph, but a screen reader
> > user
> > doesn't hear it voiced until the last page, and maybe that's page 360 of
> a
> > long government document. So the document is tagged and structured, but
> > it's
> > a faulty structure because the reading order is incorrect.
> >
> > Example number 2: Graphic designers who use desktop publishing programs
> > like
> > Adobe InDesign and QuarkXpress create very complex visual layouts.
> > Visually,
> > things aren't designed in a traditional top down left right pattern but
> > instead could be scattered all over the physical page. Here's an example
> of
> > a 2-page magazine spread:
> >
> http://fc02.deviantart.net/fs71/i/2010/082/e/c/Magazine_Layout_Design_1_by_B
> > reakTheRecords.jpg (This is just a random sample I pulled up on the
> > Internet, so it is only a graphic of a 2-page spread, no live text or
> > Alt-text.)
> >
> > Note that article title (or heading 1) appears on page 2, and the body
> text
> > of the story starts on page 1. Backwards! And then there are 2 quotes at
> > the
> > top of page 1, so obviously the designer wants us to read those at the
> > beginning of the story, also. And here's a similar example:
> >
> https://m1.behance.net/rendition/modules/12455236/disp/322ee0c042b2949607393
> > d8b1f24ad96.jpg
> >
> > Whew! Getting a tagged, logical reading order from this type of
> > publication
> > isn't easy!
> >
> > Summary:
> > Structure equals tagged content placed in a logical reading order.
> >
> > Well, that's my attempt. Would love to hear how others describe the
> > concepts.
> >
> > --Bevi Chagnon
> >
> >

From: Andrew Kirkpatrick
Date: Wed, Feb 18 2015 12:36PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Tagged PDF just means that it has tags at all. There is no guarantee that they are correct. This is where PDF/UA helps in that it answers the important question "tagged how?"
AWK

From: Chagnon | PubCom
Date: Wed, Feb 18 2015 12:41PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Lynn, I too have a strong programming background in HTML, as well as SGML,
XML, and many other markup languages. So tags plus reading order create the
document's structure in my mind! In theory, I don't believe a PDF can have
any structure, good or bad, without tags. All PDFs have a page architecture,
but that's not the same thing as structure.

Lynn asked: " if so how would I recognise it if I were to examine the
document's building blocks "

You have to examine it from several viewpoints in Acrobat Pro. I teach my
students this method:
1. Run Acrobat's accessibility checker. This looks at only about 20% of the
document's features, so don't depend on it for a full check.

2. Run down the tag tree, top-to-bottom. I call this the tag reading order.
For sighted users, they can arrow down from tag to tag and also see on the
page which item is highlighted for each tag. They'll see very quickly that
the figures weren't read at the correct place in the tag tree, or that the
second half of body text was read first, then the heading 1, then the
remaining body text.

For screen reader users, this is what your software is using. But it's more
difficult to tell if the document is correct. Were you able to hear and
figure out what was read? Did it make sense (not the content itself, but the
order in which you heard it)? Screen readers also can't tell sometimes if
it's tagged correctly. Example: Adobe InDesign has a tragic flaw. When a
sidebar (boxed text that's secondary to the main story) is exported to PDF,
the conversion isn't correct. All of the text is jumbled together;
paragraphs are lost, including any headings, bulleted lists, tables,
figures, etc. So a screen reader just hears the text run-on blah blah blah,
but never knows if he's reading one paragraph, multiple paragraphs,
headings, or any other parts of a document. My screen reader testers often
miss these problems; they just can't tell if they've missing something or if
it's incorrect.

3. Run down the "real" reading order. This is the Order panel in Acrobat.
Often overlooked by many in accessible documentation, this is the original
reading order that's still used by many assistive technologies, including
braille printers and keyboards. I've never had any of my screen reader
testers review this because their software has a hard time voicing it in a
way that makes sense to them. But they can see this reading order another
way; View / Zoom / Reflow. This utility rejiggers the visual layout on the
screen to mimic the real reading order. Columns are removed, everything is
sequential and linear, top to bottom. So if the first item read by a screen
reader happens to be the photo caption, not heading 1, then you have a
reading order problem.

4. After that, the usual review of tags, tables, alt-text, etc. takes place.

--Bevi Chagnon

From: L Snider
Date: Wed, Feb 18 2015 12:51PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Hi Bevi,

One question on this:
1. Run Acrobat's accessibility checker. This looks at only about 20% of the
document's features, so don't depend on it for a full check.

This is the full report and check, right? If so, what else would you check?

Cheers

Lisa

On Wed, Feb 18, 2015 at 1:41 PM, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = >
wrote:

> Lynn, I too have a strong programming background in HTML, as well as SGML,
> XML, and many other markup languages. So tags plus reading order create the
> document's structure in my mind! In theory, I don't believe a PDF can have
> any structure, good or bad, without tags. All PDFs have a page
> architecture,
> but that's not the same thing as structure.
>
> Lynn asked: " if so how would I recognise it if I were to examine the
> document's building blocks "
>
> You have to examine it from several viewpoints in Acrobat Pro. I teach my
> students this method:
> 1. Run Acrobat's accessibility checker. This looks at only about 20% of the
> document's features, so don't depend on it for a full check.
>
> 2. Run down the tag tree, top-to-bottom. I call this the tag reading order.
> For sighted users, they can arrow down from tag to tag and also see on the
> page which item is highlighted for each tag. They'll see very quickly that
> the figures weren't read at the correct place in the tag tree, or that the
> second half of body text was read first, then the heading 1, then the
> remaining body text.
>
> For screen reader users, this is what your software is using. But it's more
> difficult to tell if the document is correct. Were you able to hear and
> figure out what was read? Did it make sense (not the content itself, but
> the
> order in which you heard it)? Screen readers also can't tell sometimes if
> it's tagged correctly. Example: Adobe InDesign has a tragic flaw. When a
> sidebar (boxed text that's secondary to the main story) is exported to PDF,
> the conversion isn't correct. All of the text is jumbled together;
> paragraphs are lost, including any headings, bulleted lists, tables,
> figures, etc. So a screen reader just hears the text run-on blah blah blah,
> but never knows if he's reading one paragraph, multiple paragraphs,
> headings, or any other parts of a document. My screen reader testers often
> miss these problems; they just can't tell if they've missing something or
> if
> it's incorrect.
>
> 3. Run down the "real" reading order. This is the Order panel in Acrobat.
> Often overlooked by many in accessible documentation, this is the original
> reading order that's still used by many assistive technologies, including
> braille printers and keyboards. I've never had any of my screen reader
> testers review this because their software has a hard time voicing it in a
> way that makes sense to them. But they can see this reading order another
> way; View / Zoom / Reflow. This utility rejiggers the visual layout on the
> screen to mimic the real reading order. Columns are removed, everything is
> sequential and linear, top to bottom. So if the first item read by a screen
> reader happens to be the photo caption, not heading 1, then you have a
> reading order problem.
>
> 4. After that, the usual review of tags, tables, alt-text, etc. takes
> place.
>
> --Bevi Chagnon
>
>

From: Chagnon | PubCom
Date: Wed, Feb 18 2015 1:08PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Yes, always run the full report in Acrobat checker and don't waste your time
with the other options.

The Acrobat checker tells you if the PDF is tagged, but not if they're the
right tags.
It tells you if anything is untagged, which quite often is sidebar boxes,
captions, and other pieces that were left out of the tag tree.
Tells if any graphics are missing Alt-Text.
Language and file name options are also flagged if missing.
And sometimes it can detect when the structure might be off, such as
headings that appear out of order as heading 3, heading 1, heading 6.

But even with the full report from Acrobat, you're still not getting all the
information you need. One reason: software can't interpret if those are the
right tags and if they're in the correct, logical reading order. Only humans
can assess that!

--Bevi Chagnon


From: Chagnon | PubCom
Date: Wed, Feb 18 2015 1:12PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Agree with Andrew. Bad tags do not create a structure.

Adobe InDesign is famous for creating ridiculous tags in PDFs exported from
the layout files. Tags like <blue_subhead_with_extra_space_above> or
<judys_inserted_copy> are some I recently saw. Acrobat erroneously creates
the tags from the names of the designer's paragraph formatting styles, not
from what has been programmed to be <h1> or <h2>. So Acrobat's Role Map
utility has to reinterpret those wild and crazy tags into normal <h1>, <h2>
etc tags.

A lot depends on 4 things:

1) The software version in which the software was created. MS Word 2013 tags
things more correctly than Word 2010 or 2007. Same with Adobe InDesign.
Always use the most recent version to create the source documents. Standards
change, as well as the tools we use to create to those standards, so the
latest software version will always give the best results and, hopefully,
builds documents to the latest standards. As an example, look how the
tagging of lists has changed over the past 10 years.

2) The software version of Acrobat that was used to create the PDF. In MS
Word, for example, when we select File / Save as PDF, we're invoking an
Acrobat module (or plug-in) in Word that interprets the Word document to
create the PDF. Which version of Acrobat did the conversion? Acrobat 11 does
a better job than 10 which does a better job than 9. FYI, you can see the
versions of Acrobat and the source program in the PDF's File / Properties
utility. Also, some people use non-Adobe PDF makers, which from my
experience don't make accessible PDFs at all.

3) The conversion settings (or preferences) when the PDF was exported from
the source document. Miss a few checkboxes in the settings and you won't get
an accessible PDF.

4) The skill of the person who created the source document and the PDF. If
they don't know how to use Word's footnote utility and instead insert them
by hand, then the PDF's footnotes won't be fully accessible. If they're a
novice user of Adobe InDesign, forget it! The file will be a inaccessible
nightmare!

--Bevi Chagnon



From: L Snider
Date: Wed, Feb 18 2015 1:18PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Ah okay, I see where you were going now-thanks. Yes, it is like WCAG...You
can do it and it can still be inaccessible :)

Even after all these years, most of it is still manual. Funny how things
have changed and some things are still the same.

I am loving XI Pro, because you only do the full report by default. None of
the messiness of previous versions.

Cheers

Lisa

On Wed, Feb 18, 2015 at 2:08 PM, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = >
wrote:

> Yes, always run the full report in Acrobat checker and don't waste your
> time
> with the other options.
>
> The Acrobat checker tells you if the PDF is tagged, but not if they're the
> right tags.
> It tells you if anything is untagged, which quite often is sidebar boxes,
> captions, and other pieces that were left out of the tag tree.
> Tells if any graphics are missing Alt-Text.
> Language and file name options are also flagged if missing.
> And sometimes it can detect when the structure might be off, such as
> headings that appear out of order as heading 3, heading 1, heading 6.
>
> But even with the full report from Acrobat, you're still not getting all
> the
> information you need. One reason: software can't interpret if those are the
> right tags and if they're in the correct, logical reading order. Only
> humans
> can assess that!
>
> --Bevi Chagnon
>
>
>

From: Brian Richwine
Date: Wed, Feb 18 2015 1:25PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Andrew,

Is there any value to checking if a PDF contains a Marked entry with the
value of true as referred to in the 10.7 of the PDF spec? I see a lot of
PDFs with a Marked true entry that don't conform to standard structure
types (after accounting for the role map if provided). They definitely
don't often conform to the other items listed in 10.7.1. What is it for?

-Brian

On Wed, Feb 18, 2015 at 2:36 PM, Andrew Kirkpatrick < = EMAIL ADDRESS REMOVED = >
wrote:

> Tagged PDF just means that it has tags at all. There is no guarantee that
> they are correct. This is where PDF/UA helps in that it answers the
> important question "tagged how?"
> AWK
>
>

From: Chagnon | PubCom
Date: Wed, Feb 18 2015 3:48PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Lisa,
Here's an excellent example of a flawed tag tree reading order, which then
creates an out-of-whack structure.
Surprisingly, it's from the US Access Board itself:
http://www.regulations.gov/#!documentDetail;D=ATBCB-2015-0002-0001 (view the
Content section and look for the PDF there).
This is the text of the new ICT draft for Sec. 508. You'll notice in the tag
tree that the figures are all stacked at the top of the tag tree...yet they
appear in the back portion of the draft on pages 186-192.
This error creates the following reading order:
1. The agency's seal/logo on page 1.
2. 9 illustrations on pages 186 through 192.
3. The title of the document (tagged with a P tag) on page 1.
4. The remaining pages of the document.
This error is because they used an older version of MS Word, which does this
to all graphics...stacks them at the top of the tag tree, or at the end of
the tag tree, or anywhere it feels like it throughout the entire
document...regardless of how someone anchors the graphics in the Word
document itself. Word 2013, on the other hand, doesn't make this error and
places the graphics correctly in the PDF tag tree.
It also doesn't help that they used Acrobat 10 to create the PDF from Word.
--Bevi

From: Ryan E. Benson
Date: Wed, Feb 18 2015 3:50PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

InDesign only recognizes a handful of standard PDF tags. I can't find the
list right now, but I am pretty sure it is in the help. InDesign knows
<Table>, <Tr> and <Td>, for example, but not <TH> or something like that.
PDF tags are case sensitive, so if you create an h1 Tag for your inDesign
document, it gets mapped to the <P> tag in the PDF. However, creating the
H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.

--
Ryan E. Benson

On Wed, Feb 18, 2015 at 3:12 PM, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = >
wrote:

> Agree with Andrew. Bad tags do not create a structure.
>
> Adobe InDesign is famous for creating ridiculous tags in PDFs exported from
> the layout files. Tags like <blue_subhead_with_extra_space_above> or
> <judys_inserted_copy> are some I recently saw. Acrobat erroneously creates
> the tags from the names of the designer's paragraph formatting styles, not
> from what has been programmed to be <h1> or <h2>. So Acrobat's Role Map
> utility has to reinterpret those wild and crazy tags into normal <h1>, <h2>
> etc tags.
>
> A lot depends on 4 things:
>
> 1) The software version in which the software was created. MS Word 2013
> tags
> things more correctly than Word 2010 or 2007. Same with Adobe InDesign.
> Always use the most recent version to create the source documents.
> Standards
> change, as well as the tools we use to create to those standards, so the
> latest software version will always give the best results and, hopefully,
> builds documents to the latest standards. As an example, look how the
> tagging of lists has changed over the past 10 years.
>
> 2) The software version of Acrobat that was used to create the PDF. In MS
> Word, for example, when we select File / Save as PDF, we're invoking an
> Acrobat module (or plug-in) in Word that interprets the Word document to
> create the PDF. Which version of Acrobat did the conversion? Acrobat 11
> does
> a better job than 10 which does a better job than 9. FYI, you can see the
> versions of Acrobat and the source program in the PDF's File / Properties
> utility. Also, some people use non-Adobe PDF makers, which from my
> experience don't make accessible PDFs at all.
>
> 3) The conversion settings (or preferences) when the PDF was exported from
> the source document. Miss a few checkboxes in the settings and you won't
> get
> an accessible PDF.
>
> 4) The skill of the person who created the source document and the PDF. If
> they don't know how to use Word's footnote utility and instead insert them
> by hand, then the PDF's footnotes won't be fully accessible. If they're a
> novice user of Adobe InDesign, forget it! The file will be a inaccessible
> nightmare!
>
> --Bevi Chagnon
>
>
>
>

From: Olaf Drümmer
Date: Wed, Feb 18 2015 4:13PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Hi Ryan,

On 18 Feb 2015, at 23:50, Ryan E. Benson < = EMAIL ADDRESS REMOVED = > wrote:

> InDesign only recognizes a handful of standard PDF tags. I can't find the
> list right now, but I am pretty sure it is in the help. InDesign knows
> <Table>, <Tr> and <Td>, for example, but not <TH> or something like that.

it does handle <TH> quite well (at least for column headers).

> PDF tags are case sensitive, so if you create an h1 Tag for your inDesign
> document, it gets mapped to the <P> tag in the PDF. However, creating the
> H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.

nope. What you actually do is do assign a certain tag to your style sheet which then gets used during export (and via role mapping in the resulting PDF. The list offered here consists of only H1 through H6 and P (yep, that's it, except for <H> which you do not want to use, and 'Artifact' which is not a tag, but can be handy at times). Most other stuff is just handled properly by Indesign, at least for stuff like lists and tables (with some limitations - e.g. no row headers, no complex table structures) and footnotes and figures and links and (CS 6 or newer) form fields.

Some of the glaring omissions are lack of support for table of contents (TOC / TOCI), something as easy as Caption, or BlockQuote, Quote, Formula (accompanied by lack of support for something like MathML) and a few others.

So the statement
> InDesign only recognizes a handful of standard PDF tags.

has to be turned into its opposite:
> InDesign supports a a lot of standard PDF tags.

with the following addition:
> With some very unfortunate [seemingly easy to implement/support] omissions, like support for Caption, or BlockQuote, Quote, Formula and a few others.


Olaf

From: Andrew Kirkpatrick
Date: Wed, Feb 18 2015 4:13PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Ryan,
I'm not sure what version of InDesign you are using, but InDesign does support TH tags if you use the InDesign table tool and indicate heading rows.

Related to creating the tags with upper or lower case, if you use the correct and recognized tag name from the PDF spec, then yes, the role map isn't needed. But you can also use the feature to map styles to tags and InDesign takes care of the mapping. If you ever have multiple styles that both need to map to H2 then you'll benefit from this feature.

AWK

From: Allayne Woodford
Date: Wed, Feb 18 2015 4:15PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

I use InDesign CS6 and Acrobat Pro XI. Acrobat tags available to map to from InDesign are: P, H, H1 through to H6 and Artifact. Table tags are created automatically for PDF regardless of the mapping and they pick up my Table Header Rows as TH tags if I've marked a row as such in the InDesign table properties. But regardless of my mapping, the resulting PDF pulls across all the paragraph styles I've created and uses them as the tag type that you visually see in the PDF list. Running JAWS over the PDF though, if I've tagged Paragraph Style name 'Document Heading' as a H1 for export, JAWS will announce Heading Level 1 when the screen reader is activated.

Ally Woodford
Project Manager | Media Access Australia
Level 3, 616 - 620 Harris St, Ultimo NSW 2007
Tel: 02 9212 6242 Fax: 02 9212 6289 Mobile: 0419 460 797 Web: www.mediaaccess.org.au
                                               
Media Access Australia - inclusion through technology, Access iQ® - creating a web without limits and cap that! - improving literacy with captions.

Follow us on Twitter @mediaaccessaus @AccessiQ @cap_that



From: Chagnon | PubCom
Date: Wed, Feb 18 2015 4:25PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Sorry, that's a different issue Ryan.
I'm talking about the jibberish tags that end up regardless of how a
designer sets up the document for tagging in InDesign. The core tag in the
exported PDF ends up being a jibberish version of the paragraph style's
name, not the designed tag.

RE: what you mentioned about InDesign's tags, that's sort of true. Let me
clarify InDesign's tagging method.
InDesign does indeed recognize all of the PDF tags; the problem is that it
and Acrobat don't export them as well as they should.

In InDesign, certain tags must be set in the export tag options:
Headings 1 through 6.
Artifacts (for text).
Well, that's all the control you have in InDesign! Everything else is set to
Auto, and Auto does recognize:
- Tables (and if you've set repeating headers, it will put in the TH
tag).
- Lists, both numbered and bulleted.
- Hyperlinks if you've used the hyperlink utility.
- TOCs if you've used InDesign's TOC utility.
- And pretty much the core of any InDesign document.

Except for grouped items, anchored text frames, un-hyperlinked footnotes,
un-hyperlinked indexes, and a whole lot more, the basics of an InDesign
document are tagged correctly in the PDF.

--Bevi Chagnon

From: L Snider
Date: Thu, Feb 19 2015 6:23AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

That is a very good example, thanks. I have a few of my own, but can't use
them in public-so this is perfect!

Yes, I have found the newest version of Word to be much better, 2003 was
messy...at least some progress is being made.

Thanks again!

Lisa

On Wed, Feb 18, 2015 at 4:48 PM, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = >
wrote:

> Lisa,
> Here's an excellent example of a flawed tag tree reading order, which then
> creates an out-of-whack structure.
> Surprisingly, it's from the US Access Board itself:
> http://www.regulations.gov/#!documentDetail;D=ATBCB-2015-0002-0001 (view
> the
> Content section and look for the PDF there).
> This is the text of the new ICT draft for Sec. 508. You'll notice in the
> tag
> tree that the figures are all stacked at the top of the tag tree...yet they
> appear in the back portion of the draft on pages 186-192.
> This error creates the following reading order:
> 1. The agency's seal/logo on page 1.
> 2. 9 illustrations on pages 186 through 192.
> 3. The title of the document (tagged with a P tag) on page 1.
> 4. The remaining pages of the document.
> This error is because they used an older version of MS Word, which does
> this
> to all graphics...stacks them at the top of the tag tree, or at the end of
> the tag tree, or anywhere it feels like it throughout the entire
> document...regardless of how someone anchors the graphics in the Word
> document itself. Word 2013, on the other hand, doesn't make this error and
> places the graphics correctly in the PDF tag tree.
> It also doesn't help that they used Acrobat 10 to create the PDF from Word.
> --Bevi
>
>

From: Ryan E. Benson
Date: Thu, Feb 19 2015 1:57PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Hi Olaf,

>it does handle <TH> quite well (at least for column headers).

Not sure why you think this, since it is not a supported tag, as you
mention below.

>nope. What you actually do is do assign a certain tag to your style sheet
which then gets used during export (and via role mapping in the resulting
PDF.

I was trying to cut out the jargon. Most people do not know there are
styles and tags within inDesign, in my experience. If they just make a
style, and lazily name it h1, that gets exported and mapped to P - in the
two sample files I tried - in CS6. The user has 3 options. 1- properly
named styles. 2- Open up the style, choose the right tag via export tag
options. 3- open the tags pane, use the map styles to tags option, and map
it. This assumes the user opened up the structure pane, and used the "add
untagged items" option. This also creates the known tags to inDesign -
which is 9.

> InDesign supports a a lot of standard PDF tags.
9 of 34 is 26%. Not sure if you call that a lot. Now if a user sets up
their document properly, and use the built in features, of course that goes
up. In my experience, working with designers, who have a degree, and
trained inDesign, don't do or know this.

--
Ryan E. Benson

On Wed, Feb 18, 2015 at 6:13 PM, Olaf Drümmer < = EMAIL ADDRESS REMOVED = > wrote:

> Hi Ryan,
>
> On 18 Feb 2015, at 23:50, Ryan E. Benson < = EMAIL ADDRESS REMOVED = > wrote:
>
> > InDesign only recognizes a handful of standard PDF tags. I can't find the
> > list right now, but I am pretty sure it is in the help. InDesign knows
> > <Table>, <Tr> and <Td>, for example, but not <TH> or something like
> that.
>
> it does handle <TH> quite well (at least for column headers).
>
> > PDF tags are case sensitive, so if you create an h1 Tag for your inDesign
> > document, it gets mapped to the <P> tag in the PDF. However, creating the
> > H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.
>
> nope. What you actually do is do assign a certain tag to your style sheet
> which then gets used during export (and via role mapping in the resulting
> PDF. The list offered here consists of only H1 through H6 and P (yep,
> that's it, except for <H> which you do not want to use, and 'Artifact'
> which is not a tag, but can be handy at times). Most other stuff is just
> handled properly by Indesign, at least for stuff like lists and tables
> (with some limitations - e.g. no row headers, no complex table structures)
> and footnotes and figures and links and (CS 6 or newer) form fields.
>
> Some of the glaring omissions are lack of support for table of contents
> (TOC / TOCI), something as easy as Caption, or BlockQuote, Quote, Formula
> (accompanied by lack of support for something like MathML) and a few others.
>
> So the statement
> > InDesign only recognizes a handful of standard PDF tags.
>
> has to be turned into its opposite:
> > InDesign supports a a lot of standard PDF tags.
>
> with the following addition:
> > With some very unfortunate [seemingly easy to implement/support]
> omissions, like support for Caption, or BlockQuote, Quote, Formula and a
> few others.
>
>
> Olaf
>
>
> > > >

From: Andrew Kirkpatrick
Date: Thu, Feb 19 2015 2:50PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Ryan,
Comments below.

I was trying to cut out the jargon. Most people do not know there are styles and tags within inDesign, in my experience. If they just make a style, and lazily name it h1, that gets exported and mapped to P - in the two sample files I tried - in CS6. The user has 3 options. 1- properly named styles. 2- Open up the style, choose the right tag via export tag options. 3- open the tags pane, use the map styles to tags option, and map it. This assumes the user opened up the structure pane, and used the "add untagged items" option. This also creates the known tags to inDesign - which is 9.

> InDesign supports a a lot of standard PDF tags.
9 of 34 is 26%. Not sure if you call that a lot. Now if a user sets up their document properly, and use the built in features, of course that goes up. In my experience, working with designers, who have a degree, and trained inDesign, don't do or know this.

Off hand I count:
P
H
H1
H2
H3
H4
H5
H6
TABLE
THEAD
TBODY
TR
TH
TD
LIST
LI
SPAN
Document
Article
Section


Which makes 20. There are a number of inline styles that aren't supported (e.g. code, quote) and there may be others that are supported that I'm not sure of (e.g. TOC).

I think that the situation is a little better than you are characterizing.

AWK

--
Ryan E. Benson

On Wed, Feb 18, 2015 at 6:13 PM, Olaf Drümmer < = EMAIL ADDRESS REMOVED = > wrote:

> Hi Ryan,
>
> On 18 Feb 2015, at 23:50, Ryan E. Benson < = EMAIL ADDRESS REMOVED = > wrote:
>
> > InDesign only recognizes a handful of standard PDF tags. I can't
> > find the list right now, but I am pretty sure it is in the help.
> > InDesign knows <Table>, <Tr> and <Td>, for example, but not <TH> or
> > something like
> that.
>
> it does handle <TH> quite well (at least for column headers).
>
> > PDF tags are case sensitive, so if you create an h1 Tag for your
> > inDesign document, it gets mapped to the <P> tag in the PDF.
> > However, creating the
> > H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.
>
> nope. What you actually do is do assign a certain tag to your style
> sheet which then gets used during export (and via role mapping in the
> resulting PDF. The list offered here consists of only H1 through H6
> and P (yep, that's it, except for <H> which you do not want to use, and 'Artifact'
> which is not a tag, but can be handy at times). Most other stuff is
> just handled properly by Indesign, at least for stuff like lists and
> tables (with some limitations - e.g. no row headers, no complex table
> structures) and footnotes and figures and links and (CS 6 or newer) form fields.
>
> Some of the glaring omissions are lack of support for table of
> contents (TOC / TOCI), something as easy as Caption, or BlockQuote,
> Quote, Formula (accompanied by lack of support for something like MathML) and a few others.
>
> So the statement
> > InDesign only recognizes a handful of standard PDF tags.
>
> has to be turned into its opposite:
> > InDesign supports a a lot of standard PDF tags.
>
> with the following addition:
> > With some very unfortunate [seemingly easy to implement/support]
> omissions, like support for Caption, or BlockQuote, Quote, Formula and
> a few others.
>
>
> Olaf
>
>
> > > list messages to = EMAIL ADDRESS REMOVED =
>

From: Ryan E. Benson
Date: Thu, Feb 19 2015 6:48PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

>Which makes 20. There are a number of inline styles that aren't
supported (e.g. code, quote) and there may be others that are supported
that I'm not sure of (e.g. TOC).
>I think that the situation is a little better than you are characterizing.

I haven't dug around in the new docs, but the old documents said, to
quickly know what tags INDD will reliably carry over to a PDF is to run the
untagged items via the structure pane. This gives you
P
H
H1-H6
Article
Artifact
Root/Document

Other things like tables and lists were kind of a flip of a coin. I noticed
better support for lists in CS6, but this is akin to "don't worry about it,
just trust us." However, I will mention that if you click the list icon, it
will be brought over as a list. You cannot do this with headings from what
I can tell, and didn't touch tables. Working with graphic artists, they are
less likely to do this, because 1- their formal training didn't cover this
- not Adobe's fault. 2- they only have a few days to do a job, so setting
up stuff on every project is beyond a chore.

--
Ryan E. Benson

On Thu, Feb 19, 2015 at 4:50 PM, Andrew Kirkpatrick < = EMAIL ADDRESS REMOVED = >
wrote:

> Ryan,
> Comments below.
>
> I was trying to cut out the jargon. Most people do not know there are
> styles and tags within inDesign, in my experience. If they just make a
> style, and lazily name it h1, that gets exported and mapped to P - in the
> two sample files I tried - in CS6. The user has 3 options. 1- properly
> named styles. 2- Open up the style, choose the right tag via export tag
> options. 3- open the tags pane, use the map styles to tags option, and map
> it. This assumes the user opened up the structure pane, and used the "add
> untagged items" option. This also creates the known tags to inDesign -
> which is 9.
>
> > InDesign supports a a lot of standard PDF tags.
> 9 of 34 is 26%. Not sure if you call that a lot. Now if a user sets up
> their document properly, and use the built in features, of course that goes
> up. In my experience, working with designers, who have a degree, and
> trained inDesign, don't do or know this.
>
> Off hand I count:
> P
> H
> H1
> H2
> H3
> H4
> H5
> H6
> TABLE
> THEAD
> TBODY
> TR
> TH
> TD
> LIST
> LI
> SPAN
> Document
> Article
> Section
>
>
> Which makes 20. There are a number of inline styles that aren't
> supported (e.g. code, quote) and there may be others that are supported
> that I'm not sure of (e.g. TOC).
>
> I think that the situation is a little better than you are characterizing.
>
> AWK
>
> --
> Ryan E. Benson
>
> On Wed, Feb 18, 2015 at 6:13 PM, Olaf Drümmer < = EMAIL ADDRESS REMOVED = > wrote:
>
> > Hi Ryan,
> >
> > On 18 Feb 2015, at 23:50, Ryan E. Benson < = EMAIL ADDRESS REMOVED = > wrote:
> >
> > > InDesign only recognizes a handful of standard PDF tags. I can't
> > > find the list right now, but I am pretty sure it is in the help.
> > > InDesign knows <Table>, <Tr> and <Td>, for example, but not <TH> or
> > > something like
> > that.
> >
> > it does handle <TH> quite well (at least for column headers).
> >
> > > PDF tags are case sensitive, so if you create an h1 Tag for your
> > > inDesign document, it gets mapped to the <P> tag in the PDF.
> > > However, creating the
> > > H1 tag in inDesign, it correctly gets mapped to H1 in the PDF.
> >
> > nope. What you actually do is do assign a certain tag to your style
> > sheet which then gets used during export (and via role mapping in the
> > resulting PDF. The list offered here consists of only H1 through H6
> > and P (yep, that's it, except for <H> which you do not want to use, and
> 'Artifact'
> > which is not a tag, but can be handy at times). Most other stuff is
> > just handled properly by Indesign, at least for stuff like lists and
> > tables (with some limitations - e.g. no row headers, no complex table
> > structures) and footnotes and figures and links and (CS 6 or newer) form
> fields.
> >
> > Some of the glaring omissions are lack of support for table of
> > contents (TOC / TOCI), something as easy as Caption, or BlockQuote,
> > Quote, Formula (accompanied by lack of support for something like
> MathML) and a few others.
> >
> > So the statement
> > > InDesign only recognizes a handful of standard PDF tags.
> >
> > has to be turned into its opposite:
> > > InDesign supports a a lot of standard PDF tags.
> >
> > with the following addition:
> > > With some very unfortunate [seemingly easy to implement/support]
> > omissions, like support for Caption, or BlockQuote, Quote, Formula and
> > a few others.
> >
> >
> > Olaf
> >
> >
> > > > > > list messages to = EMAIL ADDRESS REMOVED =
> >
> > > messages to = EMAIL ADDRESS REMOVED =
> > > >

From: Chagnon | PubCom
Date: Thu, Feb 19 2015 8:34PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Since version 5.5, there's been no need to use the structure pane for accessible PDFs from InDesign. Just use the tools built into current versions of InDesign. It's worth upgrading to the latest version of InDesign, ver. CS 2014, because the export of accessible PDFs is more accurate. You're spending an awful lot more labor doing it the old way with an outdated version of the software.

Accessible PDFs don't need a nested, XML-like structure because at this time, the PDF's tag tree is read sequentially, so which tag is nested inside which is ignored by screen readers and other AT.

You can read a detailed tutorial I wrote for InDesign Magazine a few years ago for step-by-step instructions: Edition #46 February-March 2012 at www.InDesignMagazine.com Today's InDesign creates a more accurate PDF now than when I wrote this article, but the steps and tools remain the same.

Repeating what I wrote in a previous post regarding tags from InDesign:

In InDesign, certain tags must be set in the export tag options for each Paragraph Style:
Headings 1 through 6.
Artifacts (for text).
For everything else, leave the export tag options set to Auto. Auto does recognize:
- Tables (and if you've set repeating headers, it will put in the TH tag).
- Lists, both numbered and bulleted, as long as you've formatted them with a paragraph style setting bullets/numbers. No hand formatting.
- Hyperlinks if you've used the hyperlink utility.
- TOCs if you've used InDesign's TOC utility.
- Figures (add the Alt-text through the Object Export Options utility or through Adobe Bridge).
- And pretty much the core of any InDesign document.

Except for grouped items, anchored text frames, un-hyperlinked footnotes, un-hyperlinked indexes, and a whole lot more advanced features, the basics of an InDesign document are tagged correctly in the PDF.

And I have a hands-on class in creating accessible InDesign layouts & PDFs in a few weeks. Two seats are available for online distance learners. Contact me off-list if you'd like more information.

--Bevi Chagnon

— — —
Bevi Chagnon | www.PubCom.com
Consultants, Trainers, Designers, and Developers
For publishing technologies
| Acrobat PDF | Digital Media | XML and Automated Workflows
| GPO | Print | Desktop Publishing | Sec. 508 Accessibility | EPUBs
— — —




From: Jonathan Avila
Date: Thu, Feb 19 2015 8:42PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

> Since version 5.5, there's been no need to use the structure pane for accessible PDFs from InDesign.

Bevi, I think what you are saying is that you can use the articles pane and not worry about the structure pane as the article pane sets the content order and that's all that really matters sense the nested isn't needed. While that may be generally true I find it helpful to see the structure panel and the tags that will be generated as it gives me an idea of the order of show each element -- perhaps I'm just use to it though.

Jonathan

--
Jonathan Avila
Chief Accessibility Officer
SSB BART Group
= EMAIL ADDRESS REMOVED =

703-637-8957 (o)
Follow us: Facebook | Twitter | LinkedIn | Blog | Newsletter


From: Chagnon | PubCom
Date: Thu, Feb 19 2015 9:33PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

No, not entirely.
The articles panel isn't very helpful in most documents, like long docs.
You have to watch your story threading, anchored objects and layers throughout to get the correct tag structure, but that's just InDesign 101 that you should be doing anyway.
If you've used InDesign's tools correctly, the tag order will follow exactly what you've laid out.
Using the structure pane is a royal pain, as in PITA. It drops XML tags into the document which then get royally botched as normal copy/paste editing actions are done during production. And what a slowdown on the computer!

IMHO, that's a pretty lousy way to use InDesign! For XML, yes, which is what it was designed to do. But that overhead isn't needed for accessible PDFs. I'd rather do other things with my time than wait from my 500-page government document to refresh the structure pane!

--Bevi Chagnon

From: Ryan E. Benson
Date: Thu, Feb 19 2015 10:44PM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

This will be my final note on this thread, since I have gotten a number of
personal notes regarding my comments. My point is simply this: while
inDesign can make accessible PDFs, it is no where near being user friendly.
Adding somebody's lack of knowledge of accessibility, just compounds the
issue some magnitude I don't know how to compute. Also, adding to the pile
that a fair number of graphic artists I have worked see accessibility "not
their issue." I know that this is something different in itself.

Bevi said:
> Since version 5.5, there's been no need to use the structure pane for
accessible PDFs from InDesign.

then

>The articles panel isn't very helpful in most documents, like long docs.
>You have to watch your story threading, anchored objects and layers
throughout to get the correct tag structure, but that's just InDesign 101
that you should be doing anyway.
>If you've used InDesign's tools correctly, the tag order will follow
exactly what you've laid out.

If I missed something, please let me know. There are two ways to effect the
output, the structure pane and the articles pane. So post 5.5, the
structure pane doesn't need to be used, but usually the articles pane isn't
helpful? So what should be used to determine the output without making and
double checking the PDF? While it has been over a year since I read stuff,
most documentation I read basically said not to touch the structure pane,
only the articles pane. I found one tutorial that had step-by-step
instructions. Following these, I think I gave myself score in the 70s.
Trying myself, the score was much worse. Scrapping the new method, and
using the structure pane, it was in the low 90s - which is the norm for me,
since there is always something needing a tweak in Acrobat.

> Edition #46 February-March 2012 at www.InDesignMagazine.com
I get a "buy this domain" type page for the linkk.

--
Ryan E. Benson

On Thu, Feb 19, 2015 at 11:33 PM, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = >
wrote:

> No, not entirely.
> The articles panel isn't very helpful in most documents, like long docs.
> You have to watch your story threading, anchored objects and layers
> throughout to get the correct tag structure, but that's just InDesign 101
> that you should be doing anyway.
> If you've used InDesign's tools correctly, the tag order will follow
> exactly what you've laid out.
> Using the structure pane is a royal pain, as in PITA. It drops XML tags
> into the document which then get royally botched as normal copy/paste
> editing actions are done during production. And what a slowdown on the
> computer!
>
> IMHO, that's a pretty lousy way to use InDesign! For XML, yes, which is
> what it was designed to do. But that overhead isn't needed for accessible
> PDFs. I'd rather do other things with my time than wait from my 500-page
> government document to refresh the structure pane!
>
> --Bevi Chagnon
>
>

From: Jon Metz
Date: Fri, Feb 20 2015 7:32AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | Next message →

Howdy,

Hopefully I can shed some light on this topic a bit more diplomatically.
InDesign has a much stronger engine for making accessible PDFs since
version 5.5, and use of the Structure pane became a deprecated method
within the workflow. The structure pane is still useful in XML workflows
(my knowledge only extends to automation tasks when using Excel or Catalog
data and a way to print). However, using the Structure pane now is a waste
of time.

Using the Articles panel is a necessary method of using the correct reading
order (nothing to do with the panel in Acrobat) and structure for the tags.
I think the comment "isn't very helpful" is perhaps not really the best way
to describe it. Like, you expect there to be some more stuff that you need
to do aside from selecting text boxes on the page and clicking a button,
but that's really all you can do. Problems arise when you need to include
content outside of the added text boxes, as they end up outside of the
final structure. This is probably what was meant when longer documents were
mentioned.

If you change your Paragraph Style names to be the Tag names (P, H1, H2,
BlockQuote, Code, etc) allowed by the ISO Standard, then they'll be
included as actual Tag names and no Role Mapping will be required when you
move to Acrobat Pro. Usually you can use Character Styles to dramatically
change the visual display of tag styles. InDesign CC does a wonderful job
tagging lists appropriately.

Avoiding the Structure pane now allows people to focus on using the actual
tools of InDesign. Like most accessibility (I think Whitney Q actually
brought this up on the list a long time ago), setting things up correctly
the first time will greatly reduce the amount of work needed to remediate
later. Any designer who argues that accessibility isn't their job can focus
instead on using the tools in InDesign that they should be using (A source
of pet peeves for any designer who receives another's files!).

Regarding the statement someone made about using the Structure pane to see
the structure of tags, my recommendation would be to use the Tag panel in
Acrobat Pro for that. As a technical aside, with the inclusion of using
PDF/UA as a technical requirement of revised 508 guidelines, there is going
to be a need to do some advanced tricks in Acrobat to achieve this. When
that happens, it's going to be important to focus on keeping up with the
current methodology for fixing PDFs, so people remediating aren't likely to
be confused even further. Just a suggestion really...

As far as "InDesign Magazine" goes, I only know of two magazines, which are
InDesign Secrets (http://indesignsecrets.com/issues) and Layers (
http://layersmagazine.com/).

There's also Adobe:
http://www.adobe.com/accessibility/products/indesign.html

And I can also help if you want to contact me directly. :smile:

Best,
Jonathan

On Fri, Feb 20, 2015 at 12:44 AM, Ryan E. Benson < = EMAIL ADDRESS REMOVED = >
wrote:

> This will be my final note on this thread, since I have gotten a number of
> personal notes regarding my comments. My point is simply this: while
> inDesign can make accessible PDFs, it is no where near being user friendly.
> Adding somebody's lack of knowledge of accessibility, just compounds the
> issue some magnitude I don't know how to compute. Also, adding to the pile
> that a fair number of graphic artists I have worked see accessibility "not
> their issue." I know that this is something different in itself.
>
> Bevi said:
> > Since version 5.5, there's been no need to use the structure pane for
> accessible PDFs from InDesign.
>
> then
>
> >The articles panel isn't very helpful in most documents, like long docs.
> >You have to watch your story threading, anchored objects and layers
> throughout to get the correct tag structure, but that's just InDesign 101
> that you should be doing anyway.
> >If you've used InDesign's tools correctly, the tag order will follow
> exactly what you've laid out.
>
> If I missed something, please let me know. There are two ways to effect the
> output, the structure pane and the articles pane. So post 5.5, the
> structure pane doesn't need to be used, but usually the articles pane isn't
> helpful? So what should be used to determine the output without making and
> double checking the PDF? While it has been over a year since I read stuff,
> most documentation I read basically said not to touch the structure pane,
> only the articles pane. I found one tutorial that had step-by-step
> instructions. Following these, I think I gave myself score in the 70s.
> Trying myself, the score was much worse. Scrapping the new method, and
> using the structure pane, it was in the low 90s - which is the norm for me,
> since there is always something needing a tweak in Acrobat.
>
> > Edition #46 February-March 2012 at www.InDesignMagazine.com
> I get a "buy this domain" type page for the linkk.
>
> --
> Ryan E. Benson
>
> On Thu, Feb 19, 2015 at 11:33 PM, Chagnon | PubCom < = EMAIL ADDRESS REMOVED = >
> wrote:
>
> > No, not entirely.
> > The articles panel isn't very helpful in most documents, like long docs.
> > You have to watch your story threading, anchored objects and layers
> > throughout to get the correct tag structure, but that's just InDesign 101
> > that you should be doing anyway.
> > If you've used InDesign's tools correctly, the tag order will follow
> > exactly what you've laid out.
> > Using the structure pane is a royal pain, as in PITA. It drops XML tags
> > into the document which then get royally botched as normal copy/paste
> > editing actions are done during production. And what a slowdown on the
> > computer!
> >
> > IMHO, that's a pretty lousy way to use InDesign! For XML, yes, which is
> > what it was designed to do. But that overhead isn't needed for accessible
> > PDFs. I'd rather do other things with my time than wait from my 500-page
> > government document to refresh the structure pane!
> >
> > --Bevi Chagnon
> >
> >

From: Chagnon | PubCom
Date: Fri, Feb 20 2015 9:58AM
Subject: Re: Untagged PDF doc with table structure
← Previous message | No next message

Jon wrote: " As far as "InDesign Magazine" goes, I only know of two magazines, which are InDesign Secrets (http://indesignsecrets.com/issues ) and Layers ( http://layersmagazine.com/)."

Wow. It really hasn't been so long ago that I would forget who signed the check I received for writing the article!

InDesign Magazine is published by http://indesignsecrets.com/issues
InDesignSecrets is the website, while InDesign Magazine is one of its publications. Take a closer look at the flag on the magazine's cover. It says...ta-da! InDesign Magazine! :smile:

At the time I wrote the article for them, they were independent of each other, but in the ensuing years, David Blatner's company has consolidated them under the InDesignSecrets brand, which also includes CreativePro.

--Bevi Chagnon