WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Artifact tag vs. Change tag to artifact in Acrobat

for

From: Karlen Communications
Date: Mar 7, 2020 2:23PM


I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.

All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018. We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.

I have logged these bugs ad nauseum.

So.....

Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.

There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>. When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?

To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.

It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.

Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?

"We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.


As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.

In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.

We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.

Cheers, Karen



-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Duff Johnson
Sent: Saturday, March 7, 2020 3:02 PM
To: WebAIM Discussion List < <EMAIL REMOVED> >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Hi Karen,

Your questions are good and fair. You've heard some of (my) answers before ad nauseum, but others haven't, so...

> An Artifact Tag that needs to be Artifacted will and is, creating
> more remediation work which increases the cost of remediation. While
> it is a nice thought" that adaptive technology will "catch up" and
> give us an option to hear "Artifact" and line numbers are a great
> example of something that perhaps we need access to at various times
> for specific documents, those of us with disabilities and other
> stakeholders seem to be missing in the process of creating PDF
> standards. Even those of us who speak up are not heard.

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

All the standard can do is (try to) establish a common basis of understanding for software developers. We cannot compel the software folks to write better software; that's a marketplace and regulatory matter. In other words, it's up to you, and others who feel as you do, to get in front of AT, viewer and PDF producer software developers and demand something better. They listen to their customers.

> With this Artifact Tag, how do those of us who are using adaptive
> technology identify something like line numbers from decorative items on a page.

The <Artifact> tag is a new feature in PDF technology (as of 2017). As such your software developer must choose to support it.

> With the example of an image marked as decorative, having no
> meaningful contribution to the content of the document being read to
> us as "Artifact, pathpathpath" how does this improve our experience in
> accessing content

Clearly, it does not improve the experience. If this is happening it indicates that the software does not support PDF 2.0.

> from
> what is already an overwhelmingly inaccessible file format due to the
> amount of untagged PDF content out there?

The proportion of PDF files that are tagged is increasing rapidly. Apple's productivity suite, for example, now ONLY makes tagged PDF. A couple of years ago tagged PDF was about 18% of files being opened; I'm pretty sure it's significantly higher now.

The solution, of course, is to convince people to care enough to…. make tagged PDF! It's no different from convincing people to add alt. text, or make other corrections to enable accessibility.

> How many items in our adaptive technology settings/options are we
> going to have to go through in order to just read a PDF?

Line number support is a new feature in tagged PDF, so inevitably there will be some sort of user education associated with its introduction.

> Why not have a <LineNum> Tag?

PDF 2.0 includes precisely this feature! ISO 32000-2, Table 385 describes a LineNum attribute for <Artifact> elements. An AT that understands PDF 2.0 would thus be able to represent line numbering to AT users.

> Given that since Office 2007, in Word, parts of table gridlines are
> housed in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under
> a <Table> Tag

…if software is adding table structure tags to grid-lines then the software is broken at a conceptual level. A bug report should be submitted. In 2020, software developers who create tagged PDF have little excuse for not doing it right. It's been fully specified and published for 20 years, and successfully implemented by dozens of independent developers around the world.

> , does the implementation of the Artifact Tag mean that now we have to
> hear all of the parts of table gridlines...or underline...or paragraph
> borders...

If PDF 1.7-capable software encounters an <Artifact> element it will be confused and do whatever the developer thinks it should do when it's confused… most likely just read whatever's enclosed by the element without additional semantics… more or less what you've described as your experience. :-(

If PDF 2.0-capable software encounters <Artifact> it should do something far smarter. For example, for a screen-reader one would expect the software to ignore the content marked with <Artifact> while indicating that optional content was available, perhaps with a beep.

> just thinking of the amount of "stuff" on a page or in a document that
> one normally wouldn't "look at" but we will be forced to listen to
> until adaptive technology or IF adaptive technology catches up, makes
> me want to just convert any PDF that I get to something that I can
> actually read quickly, efficiently and not fall behind in education or employment.

An <Artifact> structure element is only intended for optional-to-read content that has positional significance; line numbers been the most obvious example. Accordingly, it would be incorrect to put an <Artifact> element on a cosmetic background image or gridline. Such objects should be marked as artifact as always, and not included in the structure tree at all.

> Is the PDF Association and the ISO committees reaching out to adaptive
> technology developers to work PDF - 2 into a development cycle?
> Having a standard that no one understands or knows about doesn't
> really help those of us with disabilities access PDF content.

The PDF Association has certainly reached out to AT developers several times over the years, and even helped (very modestly) to fund NVDA development back in 2014 or so. PDF technology is not a secret. We welcome any and all developers with an interest in PDF technology, and the best practice documents the PDF Association publishes are freely available. We are easy to find. We would LOVE to see more engagement, but it's really on users to convince their vendors; a tiny non-profit industry association cannot compel anything.

> We still don't have a way to let us know how much of a document is
> redacted although I am repeatedly told that the ISO standard gives a
> clear way of doing this.

"How much of a document is redacted" is actually an incredibly difficult concept, and I don't agree that the ISO standard "gives a clear way of doing this".

What ISO 32000 does do (again, in PDF 2.0… NOT in PDF 1.7) is give a clear way of identifying reductions in a document. Informing on the scope of redaction is a different kettle of fish, as redactions are inherently vague about their own scope. Was that a paragraph or an image that was redacted? We can't tell you. Was that 1 word or 3 that was redacted? We can't tell you. This vagueness is baked into the very nature of redaction itself. Accordingly, the specification stops at identifying redactions, and leaves it up to redaction authors and/or viewing software to characterize their scope (e.g., "half a page redacted").

> Visually someone can see the amount of space in a document that has
> been redacted. Those of us using adaptive technology need to be able
> to "see" the same thing. How many adaptive technology developers have
> implemented the ISO "solution?"

As described above it would require support for PDF 2.0, support for which is slow in coming.

> I'd really like to see the PDF standards developed with those of us
> who use adaptive technology and have to access PDF documents in mind
> and the "machines" doing the conversion to PDF create the output for
> "us." So far, the machines seem to be winning.

It's really not the standards that are the problem. The standards address all the structures you are interested in (with a few exceptions). It is (a) software and (b) untrained document authors and remediators that are letting you down.

Duff.