WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Artifact tag vs. Change tag to artifact in Acrobat

for

Number of posts in this thread: 20 (In chronological order)

From: Cindy Jouper
Date: Fri, Mar 06 2020 11:47AM
Subject: Artifact tag vs. Change tag to artifact in Acrobat
No previous message | Next message →

Hi - I am looking through some pdf documents that have been tagged by others, and I see that they are using the <Artifact> tag on items they want to artifact. I've always either right-clicked on the content box and chosen "Change tag to artifact" then deleted the tag, or I've used the Reading Order to mark a tag as background/artifact. Does tagging something as an artifact have the same effect? I just don't see that used as often, and wondered if there is a reason.

Thanks!

Cindy Jouper, CPWA<https://www.accessibilityassociation.org/cpwacertification>
Administrator, Digital Communications
Capital Region ESD 113
6005 Tyee Dr SW | Tumwater, WA 98512

From: Paul Rayius
Date: Fri, Mar 06 2020 11:53AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Hi Cindy,
As per the current PDF standard (ISO 32000-1:2008), the way that you're artifacting content is correct, as opposed to using the <Artifact> tag. This is being changed in the new version of the PDF standard (ISO 32000-2) but that's not really supported yet. When it's out and supported it'll have an effect on when a PDF is in reflow but, as mentioned, it's not really supported at this time.

I hope that helps,
Paul

Paul Rayius
Director of Training
CommonLook

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Cindy Jouper
Sent: Friday, March 6, 2020 1:48 PM
To: = EMAIL ADDRESS REMOVED =
Subject: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Hi - I am looking through some pdf documents that have been tagged by others, and I see that they are using the <Artifact> tag on items they want to artifact. I've always either right-clicked on the content box and chosen "Change tag to artifact" then deleted the tag, or I've used the Reading Order to mark a tag as background/artifact. Does tagging something as an artifact have the same effect? I just don't see that used as often, and wondered if there is a reason.

Thanks!

Cindy Jouper, CPWA<https://www.accessibilityassociation.org/cpwacertification>
Administrator, Digital Communications
Capital Region ESD 113
6005 Tyee Dr SW | Tumwater, WA 98512

From: Duff Johnson
Date: Fri, Mar 06 2020 12:00PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Hi Cindy,

> Hi - I am looking through some pdf documents that have been tagged by others, and I see that they are using the <Artifact> tag on items they want to artifact.

As Paul says there is an <Artifact> element defined in PDF 2.0 (published in 2017). Also as he says, there's little support for it today (so far). But if someone is making PDF 2.0 files and tagging PDFs this way I'd love to know it. If it's possible to share one of these files that would be great.

> I've always either right-clicked on the content box and chosen "Change tag to artifact" then deleted the tag, or I've used the Reading Order to mark a tag as background/artifact.

Yes, this is entirely appropriate for content that is irrelevant to the meaning of the document. The <Artifact> tag, however (as defined in PDF 2.0) does something a little different, which is why I'm keen to see your document.

> Does tagging something as an artifact have the same effect? I just don't see that used as often, and wondered if there is a reason.

In PDF 2.0 use of an <Artifact> tag with a PDF 2.0 viewer and supportive AT would allow a piece of content to be optional.

The classic use case is of line-numbers in a document. Sometimes you want to just read the document without hearing line-numbers; sometimes you need to reference a specific line. The <Artifact> structure type in PDF 2.0 provides AT developers with a way to offer this choice to AT users.

Duff.

From: Jonathan Avila
Date: Fri, Mar 06 2020 12:38PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Last year when I converted from Word using the Acrobat plugin I started to see the artifact tag show up in the role mappings section. At one point it was mapping certain tags to artifact tags.

Jon

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Friday, March 6, 2020 2:01 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Cindy,

> Hi - I am looking through some pdf documents that have been tagged by others, and I see that they are using the <Artifact> tag on items they want to artifact.

As Paul says there is an <Artifact> element defined in PDF 2.0 (published in 2017). Also as he says, there's little support for it today (so far). But if someone is making PDF 2.0 files and tagging PDFs this way I'd love to know it. If it's possible to share one of these files that would be great.

> I've always either right-clicked on the content box and chosen "Change tag to artifact" then deleted the tag, or I've used the Reading Order to mark a tag as background/artifact.

Yes, this is entirely appropriate for content that is irrelevant to the meaning of the document. The <Artifact> tag, however (as defined in PDF 2.0) does something a little different, which is why I'm keen to see your document.

> Does tagging something as an artifact have the same effect? I just don't see that used as often, and wondered if there is a reason.

In PDF 2.0 use of an <Artifact> tag with a PDF 2.0 viewer and supportive AT would allow a piece of content to be optional.

The classic use case is of line-numbers in a document. Sometimes you want to just read the document without hearing line-numbers; sometimes you need to reference a specific line. The <Artifact> structure type in PDF 2.0 provides AT developers with a way to offer this choice to AT users.

Duff.

From: chagnon@pubcom.com
Date: Fri, Mar 06 2020 1:27PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

As others have stated, the <Artifact> tag is part of the new PDF 2.0 standard, and will also be in the forthcoming PDF/UA-2 standard.

The problem is that no A T we know of recognize the tag. Some screen readers voice it as "Artifact Path Path Path Path Path Path Path..."

The assistive technologies haven't adopted it yet, so while it can have splendiferous potential (as Duff described), it's right now an f-ing "pain in the anatomy" that makes the PDF less accessible.

This is a good example of the lack of coordination between all the stakeholders: the standards committees, PDF software producers, and assistive technologies.

They all have to get on the same page of the hymnal in order to provide any benefit to the end users.

When we get complaints from clients and end users about them, we artifact out the <Artifact> tags. Crazy.

— — —
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
— — —
PubCom: Technologists for Accessible Design + Publishing
consulting ' training ' development ' design ' sec. 508 services
Upcoming classes at www.PubCom.com/classes
— — —
Latest blog-newsletter – Accessibility Tips at www.PubCom.com/blog

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Jonathan Avila
Sent: Friday, March 6, 2020 2:38 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Last year when I converted from Word using the Acrobat plugin I started to see the artifact tag show up in the role mappings section. At one point it was mapping certain tags to artifact tags.

Jon

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Friday, March 6, 2020 2:01 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Cindy,

> Hi - I am looking through some pdf documents that have been tagged by others, and I see that they are using the <Artifact> tag on items they want to artifact.

As Paul says there is an <Artifact> element defined in PDF 2.0 (published in 2017). Also as he says, there's little support for it today (so far). But if someone is making PDF 2.0 files and tagging PDFs this way I'd love to know it. If it's possible to share one of these files that would be great.

> I've always either right-clicked on the content box and chosen "Change tag to artifact" then deleted the tag, or I've used the Reading Order to mark a tag as background/artifact.

Yes, this is entirely appropriate for content that is irrelevant to the meaning of the document. The <Artifact> tag, however (as defined in PDF 2.0) does something a little different, which is why I'm keen to see your document.

> Does tagging something as an artifact have the same effect? I just don't see that used as often, and wondered if there is a reason.

In PDF 2.0 use of an <Artifact> tag with a PDF 2.0 viewer and supportive AT would allow a piece of content to be optional.

The classic use case is of line-numbers in a document. Sometimes you want to just read the document without hearing line-numbers; sometimes you need to reference a specific line. The <Artifact> structure type in PDF 2.0 provides AT developers with a way to offer this choice to AT users.

Duff.

From: Philip Kiff
Date: Fri, Mar 06 2020 2:25PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

I've seen Artifact tags start to pop up in files as well. I hadn't quite
pinned down what was going on, but following Jon's suggestion, I just
now performed a quick test with Word 365 (16.0.12527.20170) 32-bit and
when you generate a PDF using their built-in PDF conversion engine,
images that you have checked with the "Mark as decorative" setting seem
to be tagged with the Artifact tag in the tag tree via the rolemap.

Like Bev, I have found myself having to artifact the Artifact tags in
some documents. "Crazy" is right! And I echo her general complaint about
the frustration we find ourselves in these days where the different
stakeholders are not coordinating their efforts very well.

Phil.

On 2020-03-06 14:38, Jonathan Avila wrote:
> Last year when I converted from Word using the Acrobat plugin I started to see the artifact tag show up in the role mappings section. At one point it was mapping certain tags to artifact tags.
>
> Jon
On 2020-03-06 15:27, = EMAIL ADDRESS REMOVED = wrote:
> [....] This is a good example of the lack of coordination between all the stakeholders: the standards committees, PDF software producers, and assistive technologies.
>
> They all have to get on the same page of the hymnal in order to provide any benefit to the end users.
>
> When we get complaints from clients and end users about them, we artifact out the <Artifact> tags. Crazy.

From: Philip Kiff
Date: Fri, Mar 06 2020 2:31PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Err...I mispoke, I didn't mean the built-in PDF conversion engine, but
the Acrobat plugin conversion engine.

On 2020-03-06 16:25, Philip Kiff wrote:
> I've seen Artifact tags start to pop up in files as well. I hadn't
> quite pinned down what was going on, but following Jon's suggestion, I
> just now performed a quick test with Word 365 (16.0.12527.20170)
> 32-bit and when you generate a PDF using their built-in PDF conversion
> engine, images that you have checked with the "Mark as decorative"
> setting seem to be tagged with the Artifact tag in the tag tree via
> the rolemap.
>
> Like Bev, I have found myself having to artifact the Artifact tags in
> some documents. "Crazy" is right! And I echo her general complaint
> about the frustration we find ourselves in these days where the
> different stakeholders are not coordinating their efforts very well.
>
> Phil.
>
> On 2020-03-06 14:38, Jonathan Avila wrote:
>> Last year when I converted from Word using the Acrobat plugin I
>> started to see the artifact tag show up in the role mappings
>> section.  At one point it was mapping certain tags to artifact tags.
>>
>> Jon
> On 2020-03-06 15:27, = EMAIL ADDRESS REMOVED = wrote:
>> [....] This is a good example of the lack of coordination between all
>> the stakeholders: the standards committees, PDF software producers,
>> and assistive technologies.
>>
>> They all have to get on the same page of the hymnal in order to
>> provide any benefit to the end users.
>>
>> When we get complaints from clients and end users about them, we
>> artifact out the <Artifact> tags. Crazy.
> > > >

From: Karlen Communications
Date: Sat, Mar 07 2020 8:59AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

An Artifact Tag that needs to be Artifacted will and is, creating more
remediation work which increases the cost of remediation. While it is a nice
thought" that adaptive technology will "catch up" and give us an option to
hear "Artifact" and line numbers are a great example of something that
perhaps we need access to at various times for specific documents, those of
us with disabilities and other stakeholders seem to be missing in the
process of creating PDF standards. Even those of us who speak up are not
heard.

With this Artifact Tag, how do those of us who are using adaptive technology
identify something like line numbers from decorative items on a page. With
the example of an image marked as decorative, having no meaningful
contribution to the content of the document being read to us as "Artifact,
pathpathpath" how does this improve our experience in accessing content from
what is already an overwhelmingly inaccessible file format due to the amount
of untagged PDF content out there? How many items in our adaptive technology
settings/options are we going to have to go through in order to just read a
PDF?

Why not have a <LineNum> Tag?

Given that since Office 2007, in Word, parts of table gridlines are housed
in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under a <Table>
Tag, does the implementation of the Artifact Tag mean that now we have to
hear all of the parts of table gridlines...or underline...or paragraph
borders...just thinking of the amount of "stuff" on a page or in a document
that one normally wouldn't "look at" but we will be forced to listen to
until adaptive technology or IF adaptive technology catches up, makes me
want to just convert any PDF that I get to something that I can actually
read quickly, efficiently and not fall behind in education or employment.

Is the PDF Association and the ISO committees reaching out to adaptive
technology developers to work PDF - 2 into a development cycle? Having a
standard that no one understands or knows about doesn't really help those of
us with disabilities access PDF content.

We still don't have a way to let us know how much of a document is redacted
although I am repeatedly told that the ISO standard gives a clear way of
doing this. Visually someone can see the amount of space in a document that
has been redacted. Those of us using adaptive technology need to be able to
"see" the same thing. How many adaptive technology developers have
implemented the ISO "solution?"

I'd really like to see the PDF standards developed with those of us who use
adaptive technology and have to access PDF documents in mind and the
"machines" doing the conversion to PDF create the output for "us." So far,
the machines seem to be winning.

Cheers, Karen


-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of
Philip Kiff
Sent: Friday, March 6, 2020 4:26 PM
To: = EMAIL ADDRESS REMOVED =
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

I've seen Artifact tags start to pop up in files as well. I hadn't quite
pinned down what was going on, but following Jon's suggestion, I just now
performed a quick test with Word 365 (16.0.12527.20170) 32-bit and when you
generate a PDF using their built-in PDF conversion engine, images that you
have checked with the "Mark as decorative" setting seem to be tagged with
the Artifact tag in the tag tree via the rolemap.

Like Bev, I have found myself having to artifact the Artifact tags in some
documents. "Crazy" is right! And I echo her general complaint about the
frustration we find ourselves in these days where the different stakeholders
are not coordinating their efforts very well.

Phil.

On 2020-03-06 14:38, Jonathan Avila wrote:
> Last year when I converted from Word using the Acrobat plugin I started to
see the artifact tag show up in the role mappings section. At one point it
was mapping certain tags to artifact tags.
>
> Jon
On 2020-03-06 15:27, = EMAIL ADDRESS REMOVED = wrote:
> [....] This is a good example of the lack of coordination between all the
stakeholders: the standards committees, PDF software producers, and
assistive technologies.
>
> They all have to get on the same page of the hymnal in order to provide
any benefit to the end users.
>
> When we get complaints from clients and end users about them, we artifact
out the <Artifact> tags. Crazy.
http://webaim.org/discussion/archives

From: Duff Johnson
Date: Sat, Mar 07 2020 1:02PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Hi Karen,

Your questions are good and fair. You've heard some of (my) answers before ad nauseum, but others haven't, so...

> An Artifact Tag that needs to be Artifacted will and is, creating more
> remediation work which increases the cost of remediation. While it is a nice
> thought" that adaptive technology will "catch up" and give us an option to
> hear "Artifact" and line numbers are a great example of something that
> perhaps we need access to at various times for specific documents, those of
> us with disabilities and other stakeholders seem to be missing in the
> process of creating PDF standards. Even those of us who speak up are not
> heard.

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

All the standard can do is (try to) establish a common basis of understanding for software developers. We cannot compel the software folks to write better software; that's a marketplace and regulatory matter. In other words, it's up to you, and others who feel as you do, to get in front of AT, viewer and PDF producer software developers and demand something better. They listen to their customers.

> With this Artifact Tag, how do those of us who are using adaptive technology
> identify something like line numbers from decorative items on a page.

The <Artifact> tag is a new feature in PDF technology (as of 2017). As such your software developer must choose to support it.

> With the example of an image marked as decorative, having no meaningful
> contribution to the content of the document being read to us as "Artifact,
> pathpathpath" how does this improve our experience in accessing content

Clearly, it does not improve the experience. If this is happening it indicates that the software does not support PDF 2.0.

> from
> what is already an overwhelmingly inaccessible file format due to the amount
> of untagged PDF content out there?

The proportion of PDF files that are tagged is increasing rapidly. Apple's productivity suite, for example, now ONLY makes tagged PDF. A couple of years ago tagged PDF was about 18% of files being opened; I'm pretty sure it's significantly higher now.

The solution, of course, is to convince people to care enough to…. make tagged PDF! It's no different from convincing people to add alt. text, or make other corrections to enable accessibility.

> How many items in our adaptive technology
> settings/options are we going to have to go through in order to just read a
> PDF?

Line number support is a new feature in tagged PDF, so inevitably there will be some sort of user education associated with its introduction.

> Why not have a <LineNum> Tag?

PDF 2.0 includes precisely this feature! ISO 32000-2, Table 385 describes a LineNum attribute for <Artifact> elements. An AT that understands PDF 2.0 would thus be able to represent line numbering to AT users.

> Given that since Office 2007, in Word, parts of table gridlines are housed
> in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under a <Table>
> Tag

…if software is adding table structure tags to grid-lines then the software is broken at a conceptual level. A bug report should be submitted. In 2020, software developers who create tagged PDF have little excuse for not doing it right. It's been fully specified and published for 20 years, and successfully implemented by dozens of independent developers around the world.

> , does the implementation of the Artifact Tag mean that now we have to
> hear all of the parts of table gridlines...or underline...or paragraph
> borders...

If PDF 1.7-capable software encounters an <Artifact> element it will be confused and do whatever the developer thinks it should do when it's confused… most likely just read whatever's enclosed by the element without additional semantics… more or less what you've described as your experience. :-(

If PDF 2.0-capable software encounters <Artifact> it should do something far smarter. For example, for a screen-reader one would expect the software to ignore the content marked with <Artifact> while indicating that optional content was available, perhaps with a beep.

> just thinking of the amount of "stuff" on a page or in a document
> that one normally wouldn't "look at" but we will be forced to listen to
> until adaptive technology or IF adaptive technology catches up, makes me
> want to just convert any PDF that I get to something that I can actually
> read quickly, efficiently and not fall behind in education or employment.

An <Artifact> structure element is only intended for optional-to-read content that has positional significance; line numbers been the most obvious example. Accordingly, it would be incorrect to put an <Artifact> element on a cosmetic background image or gridline. Such objects should be marked as artifact as always, and not included in the structure tree at all.

> Is the PDF Association and the ISO committees reaching out to adaptive
> technology developers to work PDF - 2 into a development cycle? Having a
> standard that no one understands or knows about doesn't really help those of
> us with disabilities access PDF content.

The PDF Association has certainly reached out to AT developers several times over the years, and even helped (very modestly) to fund NVDA development back in 2014 or so. PDF technology is not a secret. We welcome any and all developers with an interest in PDF technology, and the best practice documents the PDF Association publishes are freely available. We are easy to find. We would LOVE to see more engagement, but it's really on users to convince their vendors; a tiny non-profit industry association cannot compel anything.

> We still don't have a way to let us know how much of a document is redacted
> although I am repeatedly told that the ISO standard gives a clear way of
> doing this.

"How much of a document is redacted" is actually an incredibly difficult concept, and I don't agree that the ISO standard "gives a clear way of doing this".

What ISO 32000 does do (again, in PDF 2.0… NOT in PDF 1.7) is give a clear way of identifying reductions in a document. Informing on the scope of redaction is a different kettle of fish, as redactions are inherently vague about their own scope. Was that a paragraph or an image that was redacted? We can't tell you. Was that 1 word or 3 that was redacted? We can't tell you. This vagueness is baked into the very nature of redaction itself. Accordingly, the specification stops at identifying redactions, and leaves it up to redaction authors and/or viewing software to characterize their scope (e.g., "half a page redacted").

> Visually someone can see the amount of space in a document that
> has been redacted. Those of us using adaptive technology need to be able to
> "see" the same thing. How many adaptive technology developers have
> implemented the ISO "solution?"

As described above it would require support for PDF 2.0, support for which is slow in coming.

> I'd really like to see the PDF standards developed with those of us who use
> adaptive technology and have to access PDF documents in mind and the
> "machines" doing the conversion to PDF create the output for "us." So far,
> the machines seem to be winning.

It's really not the standards that are the problem. The standards address all the structures you are interested in (with a few exceptions). It is (a) software and (b) untrained document authors and remediators that are letting you down.

Duff.

From: Karlen Communications
Date: Sat, Mar 07 2020 2:23PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.

All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018. We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.

I have logged these bugs ad nauseum.

So.....

Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.

There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>. When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?

To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.

It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.

Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?

"We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.


As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.

In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.

We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.

Cheers, Karen



-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Saturday, March 7, 2020 3:02 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Hi Karen,

Your questions are good and fair. You've heard some of (my) answers before ad nauseum, but others haven't, so...

> An Artifact Tag that needs to be Artifacted will and is, creating
> more remediation work which increases the cost of remediation. While
> it is a nice thought" that adaptive technology will "catch up" and
> give us an option to hear "Artifact" and line numbers are a great
> example of something that perhaps we need access to at various times
> for specific documents, those of us with disabilities and other
> stakeholders seem to be missing in the process of creating PDF
> standards. Even those of us who speak up are not heard.

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

All the standard can do is (try to) establish a common basis of understanding for software developers. We cannot compel the software folks to write better software; that's a marketplace and regulatory matter. In other words, it's up to you, and others who feel as you do, to get in front of AT, viewer and PDF producer software developers and demand something better. They listen to their customers.

> With this Artifact Tag, how do those of us who are using adaptive
> technology identify something like line numbers from decorative items on a page.

The <Artifact> tag is a new feature in PDF technology (as of 2017). As such your software developer must choose to support it.

> With the example of an image marked as decorative, having no
> meaningful contribution to the content of the document being read to
> us as "Artifact, pathpathpath" how does this improve our experience in
> accessing content

Clearly, it does not improve the experience. If this is happening it indicates that the software does not support PDF 2.0.

> from
> what is already an overwhelmingly inaccessible file format due to the
> amount of untagged PDF content out there?

The proportion of PDF files that are tagged is increasing rapidly. Apple's productivity suite, for example, now ONLY makes tagged PDF. A couple of years ago tagged PDF was about 18% of files being opened; I'm pretty sure it's significantly higher now.

The solution, of course, is to convince people to care enough to…. make tagged PDF! It's no different from convincing people to add alt. text, or make other corrections to enable accessibility.

> How many items in our adaptive technology settings/options are we
> going to have to go through in order to just read a PDF?

Line number support is a new feature in tagged PDF, so inevitably there will be some sort of user education associated with its introduction.

> Why not have a <LineNum> Tag?

PDF 2.0 includes precisely this feature! ISO 32000-2, Table 385 describes a LineNum attribute for <Artifact> elements. An AT that understands PDF 2.0 would thus be able to represent line numbering to AT users.

> Given that since Office 2007, in Word, parts of table gridlines are
> housed in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under
> a <Table> Tag

…if software is adding table structure tags to grid-lines then the software is broken at a conceptual level. A bug report should be submitted. In 2020, software developers who create tagged PDF have little excuse for not doing it right. It's been fully specified and published for 20 years, and successfully implemented by dozens of independent developers around the world.

> , does the implementation of the Artifact Tag mean that now we have to
> hear all of the parts of table gridlines...or underline...or paragraph
> borders...

If PDF 1.7-capable software encounters an <Artifact> element it will be confused and do whatever the developer thinks it should do when it's confused… most likely just read whatever's enclosed by the element without additional semantics… more or less what you've described as your experience. :-(

If PDF 2.0-capable software encounters <Artifact> it should do something far smarter. For example, for a screen-reader one would expect the software to ignore the content marked with <Artifact> while indicating that optional content was available, perhaps with a beep.

> just thinking of the amount of "stuff" on a page or in a document that
> one normally wouldn't "look at" but we will be forced to listen to
> until adaptive technology or IF adaptive technology catches up, makes
> me want to just convert any PDF that I get to something that I can
> actually read quickly, efficiently and not fall behind in education or employment.

An <Artifact> structure element is only intended for optional-to-read content that has positional significance; line numbers been the most obvious example. Accordingly, it would be incorrect to put an <Artifact> element on a cosmetic background image or gridline. Such objects should be marked as artifact as always, and not included in the structure tree at all.

> Is the PDF Association and the ISO committees reaching out to adaptive
> technology developers to work PDF - 2 into a development cycle?
> Having a standard that no one understands or knows about doesn't
> really help those of us with disabilities access PDF content.

The PDF Association has certainly reached out to AT developers several times over the years, and even helped (very modestly) to fund NVDA development back in 2014 or so. PDF technology is not a secret. We welcome any and all developers with an interest in PDF technology, and the best practice documents the PDF Association publishes are freely available. We are easy to find. We would LOVE to see more engagement, but it's really on users to convince their vendors; a tiny non-profit industry association cannot compel anything.

> We still don't have a way to let us know how much of a document is
> redacted although I am repeatedly told that the ISO standard gives a
> clear way of doing this.

"How much of a document is redacted" is actually an incredibly difficult concept, and I don't agree that the ISO standard "gives a clear way of doing this".

What ISO 32000 does do (again, in PDF 2.0… NOT in PDF 1.7) is give a clear way of identifying reductions in a document. Informing on the scope of redaction is a different kettle of fish, as redactions are inherently vague about their own scope. Was that a paragraph or an image that was redacted? We can't tell you. Was that 1 word or 3 that was redacted? We can't tell you. This vagueness is baked into the very nature of redaction itself. Accordingly, the specification stops at identifying redactions, and leaves it up to redaction authors and/or viewing software to characterize their scope (e.g., "half a page redacted").

> Visually someone can see the amount of space in a document that has
> been redacted. Those of us using adaptive technology need to be able
> to "see" the same thing. How many adaptive technology developers have
> implemented the ISO "solution?"

As described above it would require support for PDF 2.0, support for which is slow in coming.

> I'd really like to see the PDF standards developed with those of us
> who use adaptive technology and have to access PDF documents in mind
> and the "machines" doing the conversion to PDF create the output for
> "us." So far, the machines seem to be winning.

It's really not the standards that are the problem. The standards address all the structures you are interested in (with a few exceptions). It is (a) software and (b) untrained document authors and remediators that are letting you down.

Duff.

From: Jeffrey (JDS)
Date: Sat, Mar 07 2020 4:00PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Maybe we should advocate for other formants as primary format.
This mess is never going to be fixed in PDFs.

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Karlen Communications
Sent: March 7, 2020 4:24 PM
To: 'WebAIM Discussion List' < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.

All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018. We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.

I have logged these bugs ad nauseum.

So.....

Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.

There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>. When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?

To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.

It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.

Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?

"We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.


As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.

In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.

We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.

Cheers, Karen



-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Saturday, March 7, 2020 3:02 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Hi Karen,

Your questions are good and fair. You've heard some of (my) answers before ad nauseum, but others haven't, so...

> An Artifact Tag that needs to be Artifacted will and is, creating
> more remediation work which increases the cost of remediation. While
> it is a nice thought" that adaptive technology will "catch up" and
> give us an option to hear "Artifact" and line numbers are a great
> example of something that perhaps we need access to at various times
> for specific documents, those of us with disabilities and other
> stakeholders seem to be missing in the process of creating PDF
> standards. Even those of us who speak up are not heard.

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

All the standard can do is (try to) establish a common basis of understanding for software developers. We cannot compel the software folks to write better software; that's a marketplace and regulatory matter. In other words, it's up to you, and others who feel as you do, to get in front of AT, viewer and PDF producer software developers and demand something better. They listen to their customers.

> With this Artifact Tag, how do those of us who are using adaptive
> technology identify something like line numbers from decorative items on a page.

The <Artifact> tag is a new feature in PDF technology (as of 2017). As such your software developer must choose to support it.

> With the example of an image marked as decorative, having no
> meaningful contribution to the content of the document being read to
> us as "Artifact, pathpathpath" how does this improve our experience in
> accessing content

Clearly, it does not improve the experience. If this is happening it indicates that the software does not support PDF 2.0.

> from
> what is already an overwhelmingly inaccessible file format due to the
> amount of untagged PDF content out there?

The proportion of PDF files that are tagged is increasing rapidly. Apple's productivity suite, for example, now ONLY makes tagged PDF. A couple of years ago tagged PDF was about 18% of files being opened; I'm pretty sure it's significantly higher now.

The solution, of course, is to convince people to care enough to…. make tagged PDF! It's no different from convincing people to add alt. text, or make other corrections to enable accessibility.

> How many items in our adaptive technology settings/options are we
> going to have to go through in order to just read a PDF?

Line number support is a new feature in tagged PDF, so inevitably there will be some sort of user education associated with its introduction.

> Why not have a <LineNum> Tag?

PDF 2.0 includes precisely this feature! ISO 32000-2, Table 385 describes a LineNum attribute for <Artifact> elements. An AT that understands PDF 2.0 would thus be able to represent line numbering to AT users.

> Given that since Office 2007, in Word, parts of table gridlines are
> housed in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under
> a <Table> Tag

…if software is adding table structure tags to grid-lines then the software is broken at a conceptual level. A bug report should be submitted. In 2020, software developers who create tagged PDF have little excuse for not doing it right. It's been fully specified and published for 20 years, and successfully implemented by dozens of independent developers around the world.

> , does the implementation of the Artifact Tag mean that now we have to
> hear all of the parts of table gridlines...or underline...or paragraph
> borders...

If PDF 1.7-capable software encounters an <Artifact> element it will be confused and do whatever the developer thinks it should do when it's confused… most likely just read whatever's enclosed by the element without additional semantics… more or less what you've described as your experience. :-(

If PDF 2.0-capable software encounters <Artifact> it should do something far smarter. For example, for a screen-reader one would expect the software to ignore the content marked with <Artifact> while indicating that optional content was available, perhaps with a beep.

> just thinking of the amount of "stuff" on a page or in a document that
> one normally wouldn't "look at" but we will be forced to listen to
> until adaptive technology or IF adaptive technology catches up, makes
> me want to just convert any PDF that I get to something that I can
> actually read quickly, efficiently and not fall behind in education or employment.

An <Artifact> structure element is only intended for optional-to-read content that has positional significance; line numbers been the most obvious example. Accordingly, it would be incorrect to put an <Artifact> element on a cosmetic background image or gridline. Such objects should be marked as artifact as always, and not included in the structure tree at all.

> Is the PDF Association and the ISO committees reaching out to adaptive
> technology developers to work PDF - 2 into a development cycle?
> Having a standard that no one understands or knows about doesn't
> really help those of us with disabilities access PDF content.

The PDF Association has certainly reached out to AT developers several times over the years, and even helped (very modestly) to fund NVDA development back in 2014 or so. PDF technology is not a secret. We welcome any and all developers with an interest in PDF technology, and the best practice documents the PDF Association publishes are freely available. We are easy to find. We would LOVE to see more engagement, but it's really on users to convince their vendors; a tiny non-profit industry association cannot compel anything.

> We still don't have a way to let us know how much of a document is
> redacted although I am repeatedly told that the ISO standard gives a
> clear way of doing this.

"How much of a document is redacted" is actually an incredibly difficult concept, and I don't agree that the ISO standard "gives a clear way of doing this".

What ISO 32000 does do (again, in PDF 2.0… NOT in PDF 1.7) is give a clear way of identifying reductions in a document. Informing on the scope of redaction is a different kettle of fish, as redactions are inherently vague about their own scope. Was that a paragraph or an image that was redacted? We can't tell you. Was that 1 word or 3 that was redacted? We can't tell you. This vagueness is baked into the very nature of redaction itself. Accordingly, the specification stops at identifying redactions, and leaves it up to redaction authors and/or viewing software to characterize their scope (e.g., "half a page redacted").

> Visually someone can see the amount of space in a document that has
> been redacted. Those of us using adaptive technology need to be able
> to "see" the same thing. How many adaptive technology developers have
> implemented the ISO "solution?"

As described above it would require support for PDF 2.0, support for which is slow in coming.

> I'd really like to see the PDF standards developed with those of us
> who use adaptive technology and have to access PDF documents in mind
> and the "machines" doing the conversion to PDF create the output for
> "us." So far, the machines seem to be winning.

It's really not the standards that are the problem. The standards address all the structures you are interested in (with a few exceptions). It is (a) software and (b) untrained document authors and remediators that are letting you down.

Duff.

From: Duff Johnson
Date: Sat, Mar 07 2020 4:44PM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

> I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.
>
> All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018.

4 different tools broke in the same way around the same time….? Are you sure this is a PDF creator rather than a viewer issue?

> We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.

I'm not clear what you mean by "ignored" - untagged?

Regarding footnotes and endnotes; these structures are not well-defined in PDF 1.7; this is true, and so implementations may differ, which is why the PDF Association has provided extended guidance on such topics in its Best Practice Guide. PDF 2.0 addresses the limitation in the specification.

> I have logged these bugs ad nauseum.

With what response?

> Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.

Or, there's a bug in your viewer and you're seeing the same old footnote problem that we sought to address in PDF 2.0.

> There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>.
> When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?
>
> To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.

The PDF specification updated in 2008 and then in 2017… and a very modest revision will be published later this year. Tagged PDF was indeed updated in 2017… entirely necessary to resolve certain ambiguities (such as the footnotes issue) and to add support for other requested items such as redactions, line numbers, pronunciation hints, ARIA, MathML, namespaces, etc, etc.

> It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.
>
> Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?

As above, I am curious about the ToC issue… I have a feeling it's a known bug in a single piece of software, but it would be good to understand the problem better.

> "We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.

I don't know… I think today's tools are far superior to those of the past. But progress is frustratingly slow; I entirely agree. This marketplace (accessibility) doesn't get much love from big software companies, sorry to say. The problem is 10 times harder due to the elephant in the room… authors who don't know / don't care to structure their documents, forcing software to try to account for users who use styles arbitrarily, make "footnotes" by hand, use tab-stops for "tables", etc, etc. All sorts of things that break accessibility IRRESPECTIVE of format. Some say that this is the developers' problem as well; they have to write software that trains the user to do the right thing, or somehow cleans up after the user. All these things are very hard to do when the developer is also trying to allow the author to express themselves.

> As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.

> In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.
>
> We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.


I get the frustration; you are right to be frustrated! There are certainly weaknesses in the specifications, but the experience comes from the software and the authors, not the spec, which simply defines types of structures.

Duff.

From: chagnon@pubcom.com
Date: Sun, Mar 08 2020 12:14AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Hi Duff,
The problems Karen has described are visible in the PDF's tag tree.
It's not her screen reader. The PDFs are inaccessible.

Given that you are the head of the PDF Association (a paid-membership trade association), chair of the ISO PDF standards committee, and chair of the ISO PDF/UA committee, it would be better to listen to Karen and others who are on the front lines and experiencing these problems.

Don't shoot the messengers.

— — —
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
— — —
PubCom: Technologists for Accessible Design + Publishing
consulting ' training ' development ' design ' sec. 508 services
Upcoming classes at www.PubCom.com/classes
— — —
Latest blog-newsletter – Accessibility Tips at www.PubCom.com/blog

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Saturday, March 7, 2020 6:45 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

> I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.
>
> All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018.

4 different tools broke in the same way around the same time….? Are you sure this is a PDF creator rather than a viewer issue?

> We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.

I'm not clear what you mean by "ignored" - untagged?

Regarding footnotes and endnotes; these structures are not well-defined in PDF 1.7; this is true, and so implementations may differ, which is why the PDF Association has provided extended guidance on such topics in its Best Practice Guide. PDF 2.0 addresses the limitation in the specification.

> I have logged these bugs ad nauseum.

With what response?

> Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.

Or, there's a bug in your viewer and you're seeing the same old footnote problem that we sought to address in PDF 2.0.

> There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>.
> When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?
>
> To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.

The PDF specification updated in 2008 and then in 2017… and a very modest revision will be published later this year. Tagged PDF was indeed updated in 2017… entirely necessary to resolve certain ambiguities (such as the footnotes issue) and to add support for other requested items such as redactions, line numbers, pronunciation hints, ARIA, MathML, namespaces, etc, etc.

> It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.
>
> Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?

As above, I am curious about the ToC issue… I have a feeling it's a known bug in a single piece of software, but it would be good to understand the problem better.

> "We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.

I don't know… I think today's tools are far superior to those of the past. But progress is frustratingly slow; I entirely agree. This marketplace (accessibility) doesn't get much love from big software companies, sorry to say. The problem is 10 times harder due to the elephant in the room… authors who don't know / don't care to structure their documents, forcing software to try to account for users who use styles arbitrarily, make "footnotes" by hand, use tab-stops for "tables", etc, etc. All sorts of things that break accessibility IRRESPECTIVE of format. Some say that this is the developers' problem as well; they have to write software that trains the user to do the right thing, or somehow cleans up after the user. All these things are very hard to do when the developer is also trying to allow the author to express themselves.

> As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.

> In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.
>
> We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.


I get the frustration; you are right to be frustrated! There are certainly weaknesses in the specifications, but the experience comes from the software and the authors, not the spec, which simply defines types of structures.

Duff.

From: Duff Johnson
Date: Sun, Mar 08 2020 5:49AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

> The problems Karen has described are visible in the PDF's tag tree.
> It's not her screen reader. The PDFs are inaccessible.

Could I see some examples generated from these various software? It would help in addressing the issue.

> Given that you are the head of the PDF Association (a paid-membership trade association), chair of the ISO PDF standards committee, and chair of the ISO PDF/UA committee, it would be better to listen to Karen and others who are on the front lines and experiencing these problems.
>
> Don't shoot the messengers.

Yep, those fancy titles and a buck will get you a cup of coffee... 😊.

I have been listening to and learning from Karen for almost 20 years. No one is shooting anyone. I am just a messenger myself.

I'm delighted to do what I can to help. If there is a systematic error in diverse implementations that is something we can address with formal guidance... and hope the developers pay attention and make it a priority.

Duff.


> -----Original Message-----
> From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
> Sent: Saturday, March 7, 2020 6:45 PM
> To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
> Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat
>
>> I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.
>>
>> All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018.
>
> 4 different tools broke in the same way around the same time….? Are you sure this is a PDF creator rather than a viewer issue?
>
>> We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.
>
> I'm not clear what you mean by "ignored" - untagged?
>
> Regarding footnotes and endnotes; these structures are not well-defined in PDF 1.7; this is true, and so implementations may differ, which is why the PDF Association has provided extended guidance on such topics in its Best Practice Guide. PDF 2.0 addresses the limitation in the specification.
>
>> I have logged these bugs ad nauseum.
>
> With what response?
>
>> Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.
>
> Or, there's a bug in your viewer and you're seeing the same old footnote problem that we sought to address in PDF 2.0.
>
>> There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>.
>> When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?
>>
>> To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.
>
> The PDF specification updated in 2008 and then in 2017… and a very modest revision will be published later this year. Tagged PDF was indeed updated in 2017… entirely necessary to resolve certain ambiguities (such as the footnotes issue) and to add support for other requested items such as redactions, line numbers, pronunciation hints, ARIA, MathML, namespaces, etc, etc.
>
>> It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.
>>
>> Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?
>
> As above, I am curious about the ToC issue… I have a feeling it's a known bug in a single piece of software, but it would be good to understand the problem better.
>
>> "We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.
>
> I don't know… I think today's tools are far superior to those of the past. But progress is frustratingly slow; I entirely agree. This marketplace (accessibility) doesn't get much love from big software companies, sorry to say. The problem is 10 times harder due to the elephant in the room… authors who don't know / don't care to structure their documents, forcing software to try to account for users who use styles arbitrarily, make "footnotes" by hand, use tab-stops for "tables", etc, etc. All sorts of things that break accessibility IRRESPECTIVE of format. Some say that this is the developers' problem as well; they have to write software that trains the user to do the right thing, or somehow cleans up after the user. All these things are very hard to do when the developer is also trying to allow the author to express themselves.
>
>> As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.
>
>> In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.
>>
>> We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.
>
>
> I get the frustration; you are right to be frustrated! There are certainly weaknesses in the specifications, but the experience comes from the software and the authors, not the spec, which simply defines types of structures.
>
> Duff.
> > > >
> > > >

From: Karlen Communications
Date: Sun, Mar 08 2020 10:16AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Bevi is correct. It is the Tags that are messed up, not the PDF viewer (I only use Adobe Acrobat Pro DC as my viewer) and it is not my adaptive technology (in my case a screen reader which is up to date).

Duff, I am sending you my documents off-list as I don't think the WebAIM list allows attachments. Anyone else who wants my documents, let me know.

I would also encourage EVERYONE on this list who is experiencing horrid tagging from any of the conversion/remediation tools to send their documents not under NDA to Duff.

I've been beta testing applications that produce tagged PDF for about 20 years and no one appears to be listening or taking the fact that some of us "live" in the accessible PDF world 10 and 12 hours a day seriously when we log bugs or ask for features. If Duff can influence changes that serve those of us with disabilities instead of the conversion tools themselves, then we have to give him and the ISO committee a chance to move in a direction that serves the people trying to remediate and/or access PDF documents.

Regarding the Footnote/Endnote corruption: If a page has 5 footnotes, when I land on the first one, all 5 of the footnote references are read as if they are all part of that first footnote. When I look at the Tags Tree, all of the Footnotes on the page are in one Tag, not separate Tags for each Footnote. It is the same with Endnote, I land on the first one and all of the Endnotes in the document are read as if they are all part of the first reference and none of the other Endnotes are read because in the Tags Tree, they are all under a single Tag at the end of the document.

Regarding The TOC corruption: First, I've been asking for years that the text "Table of Content" or "Contents" which I've used a Subtitle Style for NOT be in the TOC. Even if I separate the text using the ability to modify the Style and add space below the text, it is ALWAYS lumped into the TOC as a <TOCI> Tag. I have to drag it out and edit the Tag to be a Heading. Additionally, the text of a <TOCI> Tag is divided by <Link> Tags as follow - "<TOCI><Link>Introduction<Link>........5" To a screen reader, this appears as follows:

Introduction
.........5

With all of the dots being read because the TOCI is not being tagged correctly. When I go back to my accessible document sample that I use for workshops, a TOC was more or less tagged correctly in October 2018 and this new tagging happened sometime after that.

The impact to those of us with disabilities is that if we get a list of links for the TOC, the Heading or navigational point is separated from its page number which is a second navigational point when it should be a single navigational point of text and page number. Prior to November 2018, when we got a list of links for a TOC, we had the following:

"Introduction...5"

Where we heard the word Introduction followed by only three dots and then the page number. We understand this to be a TOC entry and way to navigate to that point in the document.

With this new implementation that all four of the applications I tested use, it is easy, even without a learning, cognitive or visual disability to lose your place in a list of TOC links.

I mentioned that prior to October 2018 the TOCI entries were more or less tagged correctly. What I saw was sometimes the <Link> Tag would be used, sometimes the <Reference> Tag would be used and sometimes both of them would be used within a single <TOCI> Tag.

Again, we are often confronted with bloated Tags Trees that make remediation and QA arduous, time consuming and confusing.

We ARE the front line people who will always be remediating and providing quality assurance. Not everyone can afford a remediation service for all documents. Besides, we need to know that what we are getting from remediation services is what we paid for. In short, we are not going away and we need reliable tools with reliable results when we create accessible content and need to convert it to other formats.

Please, take this opportunity to send Duff EVERYTHING you can, not under NDA, that you encounter that needs fixing!!!!!!!! And briefly explain the problem!

We have to give the ISO committee a chance to influence both the specifications and developers before we completely abandon the PDF format and focus on applications that convert PDF to something accessible and readable with reliable consistent results.

Cheers, Karen.



-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Sunday, March 8, 2020 7:50 AM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat


> The problems Karen has described are visible in the PDF's tag tree.
> It's not her screen reader. The PDFs are inaccessible.

Could I see some examples generated from these various software? It would help in addressing the issue.

> Given that you are the head of the PDF Association (a paid-membership trade association), chair of the ISO PDF standards committee, and chair of the ISO PDF/UA committee, it would be better to listen to Karen and others who are on the front lines and experiencing these problems.
>
> Don't shoot the messengers.

Yep, those fancy titles and a buck will get you a cup of coffee... 😊.

I have been listening to and learning from Karen for almost 20 years. No one is shooting anyone. I am just a messenger myself.

I'm delighted to do what I can to help. If there is a systematic error in diverse implementations that is something we can address with formal guidance... and hope the developers pay attention and make it a priority.

Duff.


> -----Original Message-----
> From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of
> Duff Johnson
> Sent: Saturday, March 7, 2020 6:45 PM
> To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
> Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in
> Acrobat
>
>> I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.
>>
>> All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018.
>
> 4 different tools broke in the same way around the same time….? Are you sure this is a PDF creator rather than a viewer issue?
>
>> We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.
>
> I'm not clear what you mean by "ignored" - untagged?
>
> Regarding footnotes and endnotes; these structures are not well-defined in PDF 1.7; this is true, and so implementations may differ, which is why the PDF Association has provided extended guidance on such topics in its Best Practice Guide. PDF 2.0 addresses the limitation in the specification.
>
>> I have logged these bugs ad nauseum.
>
> With what response?
>
>> Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.
>
> Or, there's a bug in your viewer and you're seeing the same old footnote problem that we sought to address in PDF 2.0.
>
>> There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>.
>> When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?
>>
>> To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.
>
> The PDF specification updated in 2008 and then in 2017… and a very modest revision will be published later this year. Tagged PDF was indeed updated in 2017… entirely necessary to resolve certain ambiguities (such as the footnotes issue) and to add support for other requested items such as redactions, line numbers, pronunciation hints, ARIA, MathML, namespaces, etc, etc.
>
>> It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.
>>
>> Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?
>
> As above, I am curious about the ToC issue… I have a feeling it's a known bug in a single piece of software, but it would be good to understand the problem better.
>
>> "We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.
>
> I don't know… I think today's tools are far superior to those of the past. But progress is frustratingly slow; I entirely agree. This marketplace (accessibility) doesn't get much love from big software companies, sorry to say. The problem is 10 times harder due to the elephant in the room… authors who don't know / don't care to structure their documents, forcing software to try to account for users who use styles arbitrarily, make "footnotes" by hand, use tab-stops for "tables", etc, etc. All sorts of things that break accessibility IRRESPECTIVE of format. Some say that this is the developers' problem as well; they have to write software that trains the user to do the right thing, or somehow cleans up after the user. All these things are very hard to do when the developer is also trying to allow the author to express themselves.
>
>> As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.
>
>> In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.
>>
>> We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.
>
>
> I get the frustration; you are right to be frustrated! There are certainly weaknesses in the specifications, but the experience comes from the software and the authors, not the spec, which simply defines types of structures.
>
> Duff.
> > > archives at http://webaim.org/discussion/archives
> >
> > > archives at http://webaim.org/discussion/archives
>

From: Karlen Communications
Date: Sun, Mar 08 2020 10:18AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

...and get your colleagues not on this list to send him examples of what is going wrong with tagged PDF!!!!!!!!

This may be our last chance!

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Sunday, March 8, 2020 7:50 AM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat


> The problems Karen has described are visible in the PDF's tag tree.
> It's not her screen reader. The PDFs are inaccessible.

Could I see some examples generated from these various software? It would help in addressing the issue.

> Given that you are the head of the PDF Association (a paid-membership trade association), chair of the ISO PDF standards committee, and chair of the ISO PDF/UA committee, it would be better to listen to Karen and others who are on the front lines and experiencing these problems.
>
> Don't shoot the messengers.

Yep, those fancy titles and a buck will get you a cup of coffee... 😊.

I have been listening to and learning from Karen for almost 20 years. No one is shooting anyone. I am just a messenger myself.

I'm delighted to do what I can to help. If there is a systematic error in diverse implementations that is something we can address with formal guidance... and hope the developers pay attention and make it a priority.

Duff.


> -----Original Message-----
> From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of
> Duff Johnson
> Sent: Saturday, March 7, 2020 6:45 PM
> To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
> Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in
> Acrobat
>
>> I've now had the opportunity to take a look at four of the PDF tools for conversion and remediation: Acrobat Pro DC, Nuance PowerPDF Advanced now Kofax PowerPDF Advanced, Foxit for Business and the Microsoft conversion tool.
>>
>> All of them are breaking a TOC when the TOC is created to be accessible. This started some time after October 2018.
>
> 4 different tools broke in the same way around the same time….? Are you sure this is a PDF creator rather than a viewer issue?
>
>> We are getting truncated TOC's that are difficult to slog through using adaptive technology. Likewise with Footnotes and Endnotes. What I see in all of the tools available mentioned above is the first Footnote or Endnote housing all of the Footnotes on the page or all of the Endnotes on the page and subsequent Footnotes or Endnotes being ignored. This is in the tagging, not the AT or the Viewer...this is apparently how all of the developers for the four tools mentioned above are interpreting the PDF standards.
>
> I'm not clear what you mean by "ignored" - untagged?
>
> Regarding footnotes and endnotes; these structures are not well-defined in PDF 1.7; this is true, and so implementations may differ, which is why the PDF Association has provided extended guidance on such topics in its Best Practice Guide. PDF 2.0 addresses the limitation in the specification.
>
>> I have logged these bugs ad nauseum.
>
> With what response?
>
>> Either the specifications are not correct or every developer is interpreting them in the same incorrect way which is breaking the accessibility of a PDF document. I can't imagine that anyone would create a spec that breaks a TOC or lumps all Footnotes or Endnotes together with the first one then ignores all the rest.
>
> Or, there's a bug in your viewer and you're seeing the same old footnote problem that we sought to address in PDF 2.0.
>
>> There are also <Span> Tags randomly thrown into the Tags Tree containing content that shouldn't need a <Span>.
>> When I convert from PowerPoint I can get <H1> Tags nested under a <Figure> Tag where there is no figure on the slide. I've had accessible Word documents recently have <Sect>, <Part> or now <Div> Tags for EVERY paragraph, bloating a Tags Tree and slowing down QA. Whatever happened to a clean Tags Tree...does anyone remember them?
>>
>> To be fair to the developers, it is difficult to code to a moving target or a specification that shifts 180 degrees with each iteration.
>
> The PDF specification updated in 2008 and then in 2017… and a very modest revision will be published later this year. Tagged PDF was indeed updated in 2017… entirely necessary to resolve certain ambiguities (such as the footnotes issue) and to add support for other requested items such as redactions, line numbers, pronunciation hints, ARIA, MathML, namespaces, etc, etc.
>
>> It is as if the conversion tools throw the content and the Tags up in the air and however they pair themselves is what I get...and what I have to fix...or have to try and read.
>>
>> Perhaps we need a specification that is less arduous to implement for developers? How can all developers make the same "mistakes"?
>
> As above, I am curious about the ToC issue… I have a feeling it's a known bug in a single piece of software, but it would be good to understand the problem better.
>
>> "We" can provide training to document authors and let the conversion/remediation tools developers know where the problems are BUT those of us with disabilities and accessible PDF content are still on the fringes of the radar 20 years after Acrobat 5 was introduced. I tell clients that the tagging tools we have today are worse than Acrobat 5. We've gone backward instead of forward.
>
> I don't know… I think today's tools are far superior to those of the past. But progress is frustratingly slow; I entirely agree. This marketplace (accessibility) doesn't get much love from big software companies, sorry to say. The problem is 10 times harder due to the elephant in the room… authors who don't know / don't care to structure their documents, forcing software to try to account for users who use styles arbitrarily, make "footnotes" by hand, use tab-stops for "tables", etc, etc. All sorts of things that break accessibility IRRESPECTIVE of format. Some say that this is the developers' problem as well; they have to write software that trains the user to do the right thing, or somehow cleans up after the user. All these things are very hard to do when the developer is also trying to allow the author to express themselves.
>
>> As I said in my previous post, it is one thing to have a seat at the table, it is another to be taken seriously and listened to. I am not sure we have either in the PDF universe.
>
>> In the past year, when I look at a Tags Tree converted from an accessible document, it is like a dog's breakfast that needs more remediation than in the past.
>>
>> We can advocate for accessible digital content all we want, but if we don't have the tools that reliably convert one format to another, there is little we can do but support distribution in the source format when we create source documents to be accessible. As people with disabilities or who use adaptive technology to access digital content, we can advocate for what we need from digital content to be accessible but if we aren't part of the specification development/aren't taken seriously/listened to, our voices are unheard and we face a plethora of inaccessible content, in this discussion, inaccessible PDF. We are also then frustrated and discouraged thinking of further participation.
>
>
> I get the frustration; you are right to be frustrated! There are certainly weaknesses in the specifications, but the experience comes from the software and the authors, not the spec, which simply defines types of structures.
>
> Duff.
> > > archives at http://webaim.org/discussion/archives
> >
> > > archives at http://webaim.org/discussion/archives
>

From: Karlen Communications
Date: Mon, Mar 09 2020 6:26AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

One last word on one of Duff's comments:

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

This is my point. I am reminded of my father, who worked for Ford, buying a Pinto when Pinto's were popular. Once reports came out that their gas tanks were exploding on contact he quickly got rid of it. In the case of the Pinto, an analogy to PDF would be: was it the specifications for the car that were incorrect or was it the interpretation of the specifications that was incorrect and caused the exploding gas tanks? For the Pinto we probably have an answer, given that all of the applications I am looking at are giving the same results, we don't have an answer for what is going wrong with the accessibility of PDF documents yet. Is it the specifications that are wrong or the implementation/interpretation? In either case, something needs to be fixed.

PLEASE send all your NDA horribly tagged documents to Duff!!!!!!! As I say, this may be our last chance to salvage the file format as an accessible one.

Cheers, Karen



-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Saturday, March 7, 2020 3:02 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

Hi Karen,

Your questions are good and fair. You've heard some of (my) answers before ad nauseum, but others haven't, so...

> An Artifact Tag that needs to be Artifacted will and is, creating
> more remediation work which increases the cost of remediation. While
> it is a nice thought" that adaptive technology will "catch up" and
> give us an option to hear "Artifact" and line numbers are a great
> example of something that perhaps we need access to at various times
> for specific documents, those of us with disabilities and other
> stakeholders seem to be missing in the process of creating PDF
> standards. Even those of us who speak up are not heard.

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

All the standard can do is (try to) establish a common basis of understanding for software developers. We cannot compel the software folks to write better software; that's a marketplace and regulatory matter. In other words, it's up to you, and others who feel as you do, to get in front of AT, viewer and PDF producer software developers and demand something better. They listen to their customers.

> With this Artifact Tag, how do those of us who are using adaptive
> technology identify something like line numbers from decorative items on a page.

The <Artifact> tag is a new feature in PDF technology (as of 2017). As such your software developer must choose to support it.

> With the example of an image marked as decorative, having no
> meaningful contribution to the content of the document being read to
> us as "Artifact, pathpathpath" how does this improve our experience in
> accessing content

Clearly, it does not improve the experience. If this is happening it indicates that the software does not support PDF 2.0.

> from
> what is already an overwhelmingly inaccessible file format due to the
> amount of untagged PDF content out there?

The proportion of PDF files that are tagged is increasing rapidly. Apple's productivity suite, for example, now ONLY makes tagged PDF. A couple of years ago tagged PDF was about 18% of files being opened; I'm pretty sure it's significantly higher now.

The solution, of course, is to convince people to care enough to…. make tagged PDF! It's no different from convincing people to add alt. text, or make other corrections to enable accessibility.

> How many items in our adaptive technology settings/options are we
> going to have to go through in order to just read a PDF?

Line number support is a new feature in tagged PDF, so inevitably there will be some sort of user education associated with its introduction.

> Why not have a <LineNum> Tag?

PDF 2.0 includes precisely this feature! ISO 32000-2, Table 385 describes a LineNum attribute for <Artifact> elements. An AT that understands PDF 2.0 would thus be able to represent line numbering to AT users.

> Given that since Office 2007, in Word, parts of table gridlines are
> housed in <Span> Tags, <TR>, <TH>, <TD> Tags or just loosely put under
> a <Table> Tag

…if software is adding table structure tags to grid-lines then the software is broken at a conceptual level. A bug report should be submitted. In 2020, software developers who create tagged PDF have little excuse for not doing it right. It's been fully specified and published for 20 years, and successfully implemented by dozens of independent developers around the world.

> , does the implementation of the Artifact Tag mean that now we have to
> hear all of the parts of table gridlines...or underline...or paragraph
> borders...

If PDF 1.7-capable software encounters an <Artifact> element it will be confused and do whatever the developer thinks it should do when it's confused… most likely just read whatever's enclosed by the element without additional semantics… more or less what you've described as your experience. :-(

If PDF 2.0-capable software encounters <Artifact> it should do something far smarter. For example, for a screen-reader one would expect the software to ignore the content marked with <Artifact> while indicating that optional content was available, perhaps with a beep.

> just thinking of the amount of "stuff" on a page or in a document that
> one normally wouldn't "look at" but we will be forced to listen to
> until adaptive technology or IF adaptive technology catches up, makes
> me want to just convert any PDF that I get to something that I can
> actually read quickly, efficiently and not fall behind in education or employment.

An <Artifact> structure element is only intended for optional-to-read content that has positional significance; line numbers been the most obvious example. Accordingly, it would be incorrect to put an <Artifact> element on a cosmetic background image or gridline. Such objects should be marked as artifact as always, and not included in the structure tree at all.

> Is the PDF Association and the ISO committees reaching out to adaptive
> technology developers to work PDF - 2 into a development cycle?
> Having a standard that no one understands or knows about doesn't
> really help those of us with disabilities access PDF content.

The PDF Association has certainly reached out to AT developers several times over the years, and even helped (very modestly) to fund NVDA development back in 2014 or so. PDF technology is not a secret. We welcome any and all developers with an interest in PDF technology, and the best practice documents the PDF Association publishes are freely available. We are easy to find. We would LOVE to see more engagement, but it's really on users to convince their vendors; a tiny non-profit industry association cannot compel anything.

> We still don't have a way to let us know how much of a document is
> redacted although I am repeatedly told that the ISO standard gives a
> clear way of doing this.

"How much of a document is redacted" is actually an incredibly difficult concept, and I don't agree that the ISO standard "gives a clear way of doing this".

What ISO 32000 does do (again, in PDF 2.0… NOT in PDF 1.7) is give a clear way of identifying reductions in a document. Informing on the scope of redaction is a different kettle of fish, as redactions are inherently vague about their own scope. Was that a paragraph or an image that was redacted? We can't tell you. Was that 1 word or 3 that was redacted? We can't tell you. This vagueness is baked into the very nature of redaction itself. Accordingly, the specification stops at identifying redactions, and leaves it up to redaction authors and/or viewing software to characterize their scope (e.g., "half a page redacted").

> Visually someone can see the amount of space in a document that has
> been redacted. Those of us using adaptive technology need to be able
> to "see" the same thing. How many adaptive technology developers have
> implemented the ISO "solution?"

As described above it would require support for PDF 2.0, support for which is slow in coming.

> I'd really like to see the PDF standards developed with those of us
> who use adaptive technology and have to access PDF documents in mind
> and the "machines" doing the conversion to PDF create the output for
> "us." So far, the machines seem to be winning.

It's really not the standards that are the problem. The standards address all the structures you are interested in (with a few exceptions). It is (a) software and (b) untrained document authors and remediators that are letting you down.

Duff.

From: chagnon@pubcom.com
Date: Mon, Mar 09 2020 8:28AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

Quote: " PLEASE send all your NDA horribly tagged documents to Duff!!!!!!! As I say, this may be our last chance to salvage the file format as an accessible one." /End Quote

Duff's email is:
= EMAIL ADDRESS REMOVED =

— — —
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
— — —
PubCom: Technologists for Accessible Design + Publishing
consulting ' training ' development ' design ' sec. 508 services
Upcoming classes at www.PubCom.com/classes
— — —
Latest blog-newsletter – Accessibility Tips at www.PubCom.com/blog

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Karlen Communications
Sent: Monday, March 9, 2020 8:27 AM
To: 'WebAIM Discussion List' < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

One last word on one of Duff's comments:

The PDF (or any other) specification is not responsible for the behavior of software implementing the specification. That is the responsibility of the software developer.

This is my point. I am reminded of my father, who worked for Ford, buying a Pinto when Pinto's were popular. Once reports came out that their gas tanks were exploding on contact he quickly got rid of it. In the case of the Pinto, an analogy to PDF would be: was it the specifications for the car that were incorrect or was it the interpretation of the specifications that was incorrect and caused the exploding gas tanks? For the Pinto we probably have an answer, given that all of the applications I am looking at are giving the same results, we don't have an answer for what is going wrong with the accessibility of PDF documents yet. Is it the specifications that are wrong or the implementation/interpretation? In either case, something needs to be fixed.

PLEASE send all your NDA horribly tagged documents to Duff!!!!!!! As I say, this may be our last chance to salvage the file format as an accessible one.

Cheers, Karen

From: Jared Smith
Date: Mon, Mar 09 2020 8:40AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | Next message →

I fully understand the frustration of this thread, but let's please return
to constructive conversation or move on. Personal attacks like this are not
tolerated here.

Thanks,

Jared Smith
WebAIM.org

From: Karlen Communications
Date: Mon, Mar 09 2020 9:02AM
Subject: Re: Artifact tag vs. Change tag to artifact in Acrobat
← Previous message | No next message

I apologize to the list. I thought I was being careful to talk about the
standards and the file format rather than an individual. Won't happen again.

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Jared
Smith
Sent: Monday, March 9, 2020 10:40 AM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Artifact tag vs. Change tag to artifact in Acrobat

I fully understand the frustration of this thread, but let's please return
to constructive conversation or move on. Personal attacks like this are not
tolerated here.

Thanks,

Jared Smith
WebAIM.org
http://webaim.org/discussion/archives