WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Artifact tag vs. Change tag to artifact in Acrobat

for

From: Karlen Communications
Date: Mar 8, 2020 10:16AM


Bevi is correct. It is the Tags that are messed up, not the PDF viewer (I only use Adobe Acrobat Pro DC as my viewer) and it is not my adaptive technology (in my case a screen reader which is up to date).

Duff, I am sending you my documents off-list as I don't think the WebAIM list allows attachments. Anyone else who wants my documents, let me know.

I would also encourage EVERYONE on this list who is experiencing horrid tagging from any of the conversion/remediation tools to send their documents not under NDA to Duff.

I've been beta testing applications that produce tagged PDF for about 20 years and no one appears to be listening or taking the fact that some of us "live" in the accessible PDF world 10 and 12 hours a day seriously when we log bugs or ask for features. If Duff can influence changes that serve those of us with disabilities instead of the conversion tools themselves, then we have to give him and the ISO committee a chance to move in a direction that serves the people trying to remediate and/or access PDF documents.

Regarding the Footnote/Endnote corruption: If a page has 5 footnotes, when I land on the first one, all 5 of the footnote references are read as if they are all part of that first footnote. When I look at the Tags Tree, all of the Footnotes on the page are in one Tag, not separate Tags for each Footnote. It is the same with Endnote, I land on the first one and all of the Endnotes in the document are read as if they are all part of the first reference and none of the other Endnotes are read because in the Tags Tree, they are all under a single Tag at the end of the document.

Regarding The TOC corruption: First, I've been asking for years that the text "Table of Content" or "Contents" which I've used a Subtitle Style for NOT be in the TOC. Even if I separate the text using the ability to modify the Style and add space below the text, it is ALWAYS lumped into the TOC as a <TOCI> Tag. I have to drag it out and edit the Tag to be a Heading. Additionally, the text of a <TOCI> Tag is divided by <Link> Tags as follow - "<TOCI><Link>Introduction<Link>........5" To a screen reader, this appears as follows:

Introduction
.........5

With all of the dots being read because the TOCI is not being tagged correctly. When I go back to my accessible document sample that I use for workshops, a TOC was more or less tagged correctly in October 2018 and this new tagging happened sometime after that.

The impact to those of us with disabilities is that if we get a list of links for the TOC, the Heading or navigational point is separated from its page number which is a second navigational point when it should be a single navigational point of text and page number. Prior to November 2018, when we got a list of links for a TOC, we had the following:

"Introduction...5"

Where we heard the word Introduction followed by only three dots and then the page number. We understand this to be a TOC entry and way to navigate to that point in the document.

With this new implementation that all four of the applications I tested use, it is easy, even without a learning, cognitive or visual disability to lose your place in a list of TOC links.

I mentioned that prior to October 2018 the TOCI entries were more or less tagged correctly. What I saw was sometimes the <Link> Tag would be used, sometimes the <Reference> Tag would be used and sometimes both of them would be used within a single <TOCI> Tag.

Again, we are often confronted with bloated Tags Trees that make remediation and QA arduous, time consuming and confusing.

We ARE the front line people who will always be remediating and providing quality assurance. Not everyone can afford a remediation service for all documents. Besides, we need to know that what we are getting from remediation services is what we paid for. In short, we are not going away and we need reliable tools with reliable results when we create accessible content and need to convert it to other formats.

Please, take this opportunity to send Duff EVERYTHING you can, not under NDA, that you encounter that needs fixing!!!!!!!! And briefly explain the problem!

We have to give the ISO committee a chance to influence both the specifications and developers before we completely abandon the PDF format and focus on applications that convert PDF to something accessible and readable with reliable consistent results.

Cheers, Karen.



-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Duff Johnson
Sent: Sunday, March 8, 2020 7:50 AM
To: WebAIM Discussion List < <EMAIL REMOVED> >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat


> The problems Karen has described are visible in the PDF's tag tree.
> It's not her screen reader. The PDFs are inaccessible.

Could I see some examples generated from these various software? It would help in addressing the issue.

> Given that you are the head of the PDF Association (a paid-membership trade association), chair of the ISO PDF standards committee, and chair of the ISO PDF/UA committee, it would be better to listen to Karen and others who are on the front lines and experiencing these problems.
>
> Don't shoot the messengers.

Yep, those fancy titles and a buck will get you a cup of coffee... 😊.

I have been listening to and learning from Karen for almost 20 years. No one is shooting anyone. I am just a messenger myself.

I'm delighted to do what I can to help. If there is a systematic error in diverse implementations that is something we can address with formal guidance... and hope the developers pay attention and make it a priority.

Duff.


> -----Original Message-----
> From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of
> Duff Johnson
> Sent: Saturday, March 7, 2020 6:45 PM
> To: WebAIM Discussion List < <EMAIL REMOVED> >
> Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in
> Acrobat
>
>> I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.
>>
>> All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018.
>
> 4 different tools broke in the same way around the same time….? Are you sure this is a PDF creator rather than a viewer issue?
>
>> We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.
>
> I'm not clear what you mean by "ignored" - untagged?
>
> Regarding footnotes and endnotes; these structures are not well-defined in PDF 1.7; this is true, and so implementations may differ, which is why the PDF Association has provided extended guidance on such topics in its Best Practice Guide. PDF 2.0 addresses the limitation in the specification.
>
>> I have logged these bugs ad nauseum.
>
> With what response?
>
>> Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.
>
> Or, there's a bug in your viewer and you're seeing the same old footnote problem that we sought to address in PDF 2.0.
>
>> There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>.
>> When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?
>>
>> To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.
>
> The PDF specification updated in 2008 and then in 2017… and a very modest revision will be published later this year. Tagged PDF was indeed updated in 2017… entirely necessary to resolve certain ambiguities (such as the footnotes issue) and to add support for other requested items such as redactions, line numbers, pronunciation hints, ARIA, MathML, namespaces, etc, etc.
>
>> It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.
>>
>> Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?
>
> As above, I am curious about the ToC issue… I have a feeling it's a known bug in a single piece of software, but it would be good to understand the problem better.
>
>> "We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.
>
> I don't know… I think today's tools are far superior to those of the past. But progress is frustratingly slow; I entirely agree. This marketplace (accessibility) doesn't get much love from big software companies, sorry to say. The problem is 10 times harder due to the elephant in the room… authors who don't know / don't care to structure their documents, forcing software to try to account for users who use styles arbitrarily, make "footnotes" by hand, use tab-stops for "tables", etc, etc. All sorts of things that break accessibility IRRESPECTIVE of format. Some say that this is the developers' problem as well; they have to write software that trains the user to do the right thing, or somehow cleans up after the user. All these things are very hard to do when the developer is also trying to allow the author to express themselves.
>
>> As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.
>
>> In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.
>>
>> We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.
>
>
> I get the frustration; you are right to be frustrated! There are certainly weaknesses in the specifications, but the experience comes from the software and the authors, not the spec, which simply defines types of structures.
>
> Duff.
> > > archives at http://webaim.org/discussion/archives
> >
> > > archives at http://webaim.org/discussion/archives
>