WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF/A accessibility

for

From: Karlen Communications
Date: Sep 7, 2011 7:03AM


Agreed.

Cheers, Karen

-----Original Message-----
From: <EMAIL REMOVED>
[mailto: <EMAIL REMOVED> ] On Behalf Of Duff Johnson
Sent: September-07-11 8:38 AM
To: WebAIM Discussion List
Subject: Re: [WebAIM] PDF/A accessibility

I'll offer a couple of minor clarifications.... (and here I will earn - once
again - my well-deserved reputation for pedantry... I beg forgiveness in
advance....)

On Sep 7, 2011, at 6:43 AM, Karlen Communications wrote:

> This is one of the reasons it is available.

The raw print stream is 'available' for two reasons:

1) It has to be available - otherwise users could not print the document.
2) From 1993-1999 the raw print stream was the ONLY way to extract content
from PDF for accessibility (and other) purposes.

Neither of these reasons is directed towards accessibility. The raw print
stream is not an accessibility model or mechanism in its own right. The only
'purpose' of the raw print stream is rendering the page on-screen or
in-print. You may get acceptable results in a different utilization, but
only as a coincidence, not by design.

> For many untagged documents it
> can provide the best rendering of content depending on how complicated
> the layout is and what the raw print stream is.

In the absence of tags, the raw print stream is the ONLY available option
:-(. Of course, some software may try to 'be clever' - and interpret the
print stream in order to (attempt to) impose logical structure... or "hack"
the print stream for subsequent use by software that cannot read tags. These
approaches have very very limited utility. Far better to simply insist on
correctly tagged PDF (and on software with the brains to read PDF tags).

> It is also why we have the
> option for left to right, top to bottom for untagged documents...or
> 'randomly tagged documents (documents where Tags have been added but
> not verified or corrected).
>
> In some documents when you either have to use OCR then the "virtual
> Tags" or just the "virtual Tags" the character spacing is off and
> words do run together. Being able to get a different view of the
> content often helps in decoding/reading content.

There are many sources of "run-together" words, and they are all, without
exception, evidence of a poorly constructed PDF file.

Duff Johnson

US Committee for ISO/DIS 14289 (PDF/UA), Chair

p +1.617.283.4226
e <EMAIL REMOVED>
t http://www.twitter.com/duffjohnson
w http://www.duff-johnson.com