WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Artifact tag vs. Change tag to artifact in Acrobat

for

From: Karlen Communications
Date: Mar 9, 2020 6:26AM


One last word on one of Duff's comments:

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

This is my point. I am reminded of my father, who worked for Ford, buying a Pinto when Pinto's were popular. Once reports came out that their gas tanks were exploding on contact he quickly got rid of it. In the case of the Pinto, an analogy to PDF would be: was it the specifications for the car that were incorrect or was it the interpretation of the specifications that was incorrect and caused the exploding gas tanks? For the Pinto we probably have an answer, given that all of the applications I am looking at are giving the same results, we don't have an answer for what is going wrong with the accessibility of PDF documents yet. Is it the specifications that are wrong or the implementation/interpretation? In either case, something needs to be fixed.

PLEASE send all your NDA horribly tagged documents to Duff!!!!!!! As I say, this may be our last chance to salvage the file format as an accessible one.

Cheers, Karen



-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Duff Johnson
Sent: Saturday, March 7, 2020 3:02 PM
To: WebAIM Discussion List < <EMAIL REMOVED> >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Hi Karen,

Your questions are good and fair. You've heard some of (my) answers before ad nauseum, but others haven't, so...

> An Artifact Tag that needs to be Artifacted will and is, creating
> more remediation work which increases the cost of remediation. While
> it is a nice thought" that adaptive technology will "catch up" and
> give us an option to hear "Artifact" and line numbers are a great
> example of something that perhaps we need access to at various times
> for specific documents, those of us with disabilities and other
> stakeholders seem to be missing in the process of creating PDF
> standards. Even those of us who speak up are not heard.

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

All the standard can do is (try to) establish a common basis of understanding for software developers. We cannot compel the software folks to write better software; that's a marketplace and regulatory matter. In other words, it's up to you, and others who feel as you do, to get in front of AT, viewer and PDF producer software developers and demand something better. They listen to their customers.

> With this Artifact Tag, how do those of us who are using adaptive
> technology identify something like line numbers from decorative items on a page.

The <Artifact> tag is a new feature in PDF technology (as of 2017). As such your software developer must choose to support it.

> With the example of an image marked as decorative, having no
> meaningful contribution to the content of the document being read to
> us as "Artifact, pathpathpath" how does this improve our experience in
> accessing content

Clearly, it does not improve the experience. If this is happening it indicates that the software does not support PDF 2.0.

> from
> what is already an overwhelmingly inaccessible file format due to the
> amount of untagged PDF content out there?

The proportion of PDF files that are tagged is increasing rapidly. Apple's productivity suite, for example, now ONLY makes tagged PDF. A couple of years ago tagged PDF was about 18% of files being opened; I'm pretty sure it's significantly higher now.

The solution, of course, is to convince people to care enough to…. make tagged PDF! It's no different from convincing people to add alt. text, or make other corrections to enable accessibility.

> How many items in our adaptive technology settings/options are we
> going to have to go through in order to just read a PDF?

Line number support is a new feature in tagged PDF, so inevitably there will be some sort of user education associated with its introduction.

> Why not have a <LineNum> Tag?

PDF 2.0 includes precisely this feature! ISO 32000-2, Table 385 describes a LineNum attribute for <Artifact> elements. An AT that understands PDF 2.0 would thus be able to represent line numbering to AT users.

> Given that since Office 2007, in Word, parts of table gridlines are
> housed in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under
> a <Table> Tag

…if software is adding table structure tags to grid-lines then the software is broken at a conceptual level. A bug report should be submitted. In 2020, software developers who create tagged PDF have little excuse for not doing it right. It's been fully specified and published for 20 years, and successfully implemented by dozens of independent developers around the world.

> , does the implementation of the Artifact Tag mean that now we have to
> hear all of the parts of table gridlines...or underline...or paragraph
> borders...

If PDF 1.7-capable software encounters an <Artifact> element it will be confused and do whatever the developer thinks it should do when it's confused… most likely just read whatever's enclosed by the element without additional semantics… more or less what you've described as your experience. :-(

If PDF 2.0-capable software encounters <Artifact> it should do something far smarter. For example, for a screen-reader one would expect the software to ignore the content marked with <Artifact> while indicating that optional content was available, perhaps with a beep.

> just thinking of the amount of "stuff" on a page or in a document that
> one normally wouldn't "look at" but we will be forced to listen to
> until adaptive technology or IF adaptive technology catches up, makes
> me want to just convert any PDF that I get to something that I can
> actually read quickly, efficiently and not fall behind in education or employment.

An <Artifact> structure element is only intended for optional-to-read content that has positional significance; line numbers been the most obvious example. Accordingly, it would be incorrect to put an <Artifact> element on a cosmetic background image or gridline. Such objects should be marked as artifact as always, and not included in the structure tree at all.

> Is the PDF Association and the ISO committees reaching out to adaptive
> technology developers to work PDF - 2 into a development cycle?
> Having a standard that no one understands or knows about doesn't
> really help those of us with disabilities access PDF content.

The PDF Association has certainly reached out to AT developers several times over the years, and even helped (very modestly) to fund NVDA development back in 2014 or so. PDF technology is not a secret. We welcome any and all developers with an interest in PDF technology, and the best practice documents the PDF Association publishes are freely available. We are easy to find. We would LOVE to see more engagement, but it's really on users to convince their vendors; a tiny non-profit industry association cannot compel anything.

> We still don't have a way to let us know how much of a document is
> redacted although I am repeatedly told that the ISO standard gives a
> clear way of doing this.

"How much of a document is redacted" is actually an incredibly difficult concept, and I don't agree that the ISO standard "gives a clear way of doing this".

What ISO 32000 does do (again, in PDF 2.0… NOT in PDF 1.7) is give a clear way of identifying reductions in a document. Informing on the scope of redaction is a different kettle of fish, as redactions are inherently vague about their own scope. Was that a paragraph or an image that was redacted? We can't tell you. Was that 1 word or 3 that was redacted? We can't tell you. This vagueness is baked into the very nature of redaction itself. Accordingly, the specification stops at identifying redactions, and leaves it up to redaction authors and/or viewing software to characterize their scope (e.g., "half a page redacted").

> Visually someone can see the amount of space in a document that has
> been redacted. Those of us using adaptive technology need to be able
> to "see" the same thing. How many adaptive technology developers have
> implemented the ISO "solution?"

As described above it would require support for PDF 2.0, support for which is slow in coming.

> I'd really like to see the PDF standards developed with those of us
> who use adaptive technology and have to access PDF documents in mind
> and the "machines" doing the conversion to PDF create the output for
> "us." So far, the machines seem to be winning.

It's really not the standards that are the problem. The standards address all the structures you are interested in (with a few exceptions). It is (a) software and (b) untrained document authors and remediators that are letting you down.

Duff.