WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Mismatch between PDF Object Properties Content Tag and Structure Tag

for

From: Jonathan Avila
Date: Jun 12, 2020 8:33AM


Hi Philip, thank you for your thoughts. In this case the document came from inDesign and the role mappings look fine. Most of the time when I have run into this issue it has impacted TalkBack on Android and VoiceOver on iOS - so you are correct that it tends to affect technology most that don't use the tags structure.

In this case I am having an issue with JAWS and NVDA with Adobe Reader -- so perhaps it's not this mismatch but something else wrong with my table. The table has correct row/colspan and equal number of cells, and correct markup. I did notice the table was missing a Bounding Box (BBox) array but adding one didn't seem to help. JAWS and NVDA totally skip over the table. There is no other attributes like a null actualText property or something like that either. Have you experience screen readers skipping over content such as tables completely? Any idea on what else like BBox or other attributes are required to be present in tags attributes?

Looking through the arrays such as "k" and "p" seem to be correct structures for the children and siblings. But any additional details on understanding these structures might be helpful if someone knows of a resource.

I peeked at the preflight PDF structure as well and it's hard to tell there exactly what might be wrong as reading the streams are something I am not an expert on.

Jonathan

-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Philip Kiff
Sent: Thursday, June 11, 2020 8:58 PM
To: <EMAIL REMOVED>
Subject: Re: [WebAIM] Mismatch between PDF Object Properties Content Tag and Structure Tag

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


First, I would double check the Role Map and check that the incorrect structure tag(s) are not somehow being mapped behind the scenes.

If the Role Map is not involved, then I wonder if the PDFs are coming from specific software tools other than Adobe Acrobat? I've come across some instances of PDF software tools doing some odd things with the container tags in the Content panel that don't match the structure tags. Your example screen shots show the use of Span containers. Both CommonLook Office and iText PDF library for example will generate PDFs that apply Span containers to almost all the content in the Content tree. Both those tools however are capable of producing a (more-or-less) accessible PDF by producing a correct Tag tree with semantically and structurally correct tags despite the divergence from the containers in the Content tree.

I don't yet understand how the object and tag dictionaries work in the PDF format (!), so I don't exactly know where those structure/tag mismatches get stored in the object or file. In my rudimentary testing with NVDA and JAWS, these content tree structures did not cause problems with access to the content in the accessible tag tree. So I am guessing that the assistive technology that is having trouble with such files may not be using the accessibility tag information to render the content?
I'd be curiouHs to know.

I would note that in the cases that I've looked at, if I edit any of those containers or tags with Acrobat Pro, then the mis-match disappears. I don't think you will get this mismatch if you use Acrobat Pro to autotag a file or use its various editing features.

None of this solves the mystery of where that mismatched tag/container/object info is being stored, but it may help to know about other similar cases.

Phil.

Philip Kiff
D4K Communications


On 2020-06-11 17:34, Jonathan Avila wrote:
> I've run into situations where assistive technology is not properly working with PDF documents and it's often traced back to a mismatch between the content type and display structure tag not matching. For example, the tag is a table tag in the tags panel, in the content panel there either might not be a table container or perhaps there is a table container and the Container tag is a table. However, Acrobat shows the structure tag a table header cell in the object Properties dialog. If I look at the tag dictionary everything looks correct. If I look in the container panel and tags panel all is correct - but for some reason the Object Properties dialog will show an incorrect structure tag that isn't showing up anywhere else. I have put some screenshot showing the issue (with alt text) below - if anyone can provide information on how to address this let me know.
>
> [Object properties dialog showing table header under the content tab
> under structure tag.]
>
> [Object properties dialog showing table tag under the tag tab.]
>
> Jonathan