WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF/A accessibility

for

From: Duff Johnson
Date: Sep 7, 2011 6:36AM


I'll offer a couple of minor clarifications.... (and here I will earn - once again - my well-deserved reputation for pedantry... I beg forgiveness in advance....)

On Sep 7, 2011, at 6:43 AM, Karlen Communications wrote:

> This is one of the reasons it is available.

The raw print stream is 'available' for two reasons:

1) It has to be available - otherwise users could not print the document.
2) From 1993-1999 the raw print stream was the ONLY way to extract content from PDF for accessibility (and other) purposes.

Neither of these reasons is directed towards accessibility. The raw print stream is not an accessibility model or mechanism in its own right. The only 'purpose' of the raw print stream is rendering the page on-screen or in-print. You may get acceptable results in a different utilization, but only as a coincidence, not by design.

> For many untagged documents it
> can provide the best rendering of content depending on how complicated the
> layout is and what the raw print stream is.

In the absence of tags, the raw print stream is the ONLY available option :-(. Of course, some software may try to 'be clever' - and interpret the print stream in order to (attempt to) impose logical structure... or "hack" the print stream for subsequent use by software that cannot read tags. These approaches have very very limited utility. Far better to simply insist on correctly tagged PDF (and on software with the brains to read PDF tags).

> It is also why we have the
> option for left to right, top to bottom for untagged documents...or
> 'randomly tagged documents (documents where Tags have been added but not
> verified or corrected).
>
> In some documents when you either have to use OCR then the "virtual Tags" or
> just the "virtual Tags" the character spacing is off and words do run
> together. Being able to get a different view of the content often helps in
> decoding/reading content.

There are many sources of "run-together" words, and they are all, without exception, evidence of a poorly constructed PDF file.

Duff Johnson

US Committee for ISO/DIS 14289 (PDF/UA), Chair

p +1.617.283.4226
e <EMAIL REMOVED>
t http://www.twitter.com/duffjohnson
w http://www.duff-johnson.com