WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF conversions?


From: Terence de Giere
Date: Oct 19, 2002 7:27AM

PDF files do represent a major problem.

The following links are to various possible solutions. For quick
accessibility you may be out of luck. As most software developers have
not been concerned with accessibility, and conversion tools tend to
produce wretched HTML, whatever comes out the end of the conversion
process is likely to have accessibility problems. This is especially
true of publishing programs and other similar document programs, which
are almost always strictly visually oriented. Of the tools below I have
only used the Adobe online tools.

Adobe's online tools: These produce linear, format free HTML, and
preserve nothing of the documents' original appearance, and no images.
* http://www.adobe.com/products/acrobat/access_simple_form.html
* http://www.adobe.com/products/acrobat/access_adv_form.html

The Source Forge: The tools are UNIX based command line tools, with
Windows binaries available. I think there is a Windows GUI front end for
this program as well. I do not know what is lost or preserved with this
* http://sourceforge.net/projects/pdftohtml/

The following commercial tools at least give the option of talking to
customer service or perhaps company technical support to get information
on whether these tools can produce accessible HTML.

The Magellan program appears to use Netscape layers technology (which is
non standard HTML) and CSS, or a table based layout for older browsers.
Requires Adobe Acrobat full version.
* http://www.bcltechnologies.com/products/magellan/magellan.htm
(a demo version is available)

Cheryl Kirkpatrick in a post to the forum mentioned Omnipage Pro 12,
which might be a potential intermediate solution. But one always has to
be wary of products that try to preserve the exact visual format of
another file format. I would try to get the company to produce a sample
of output that came from one of your own PDF files before buying.
* http://www.scansoft.com/omnipage/

Another thing to try is, if the original document that the PDF was made
from is available, the program that created it may have an HTML save-as
option. But even here problems with accessibility are likely. Microsoft
Word creates really messy HTML, but there are tools to clean it up.
Corel Wordperfect creates cleaner HTML. Saving a Word file as
WordPerfect, opening it and saving to HTML in Wordperfect would be an
example of a route that might get cleaner HTML. Images in particular are
a problem because a process like this outputs images without ALT text,
and without easy to identify file names. Image based PDFs are the most
difficult if it becomes necessary to describe the image. If the image
is, say, a graph, one then has to basically convert the data in the
graph into a format that makes sense in audio, and that may take a lot
of time. Making adequate descriptions of even simple photographs is no
easy task if one wants to create a description that is 'equivalent' to
the visual experience.

The PDF production chain until just recently has been concerned with
visual only document software producing a visual only PDF file that
reproduces the visual appearance of the original program's output. And
now to get more universal access, we want to convert to HTML, again
preserving the the appearance of the now intermediate PDF version. For
this to work well, the original program needs to have a way to construct
the document with accessible tagging that can be preserved in an Acrobat
5.0 document, which can then be preserved in an HTML conversion, which
is even more complex because of content linearization issues.

The whole process of document production for accessibility has to be
re-engineered from the starting point, not the end point. There is some
potential here - Open Office, 6 for example, has a file format that is
simply zipped XML files; if a proper accessible tag structure could be
developed, conversion to an accessible tagged PDF could be developed,
which in turn could be reconverted to an accessible HTML format. But it
would probably be better to develop a system where the original
accessible file was always available to be converted directly to
accessible HTML or accessible XHTML without an intermediate PDF step.

Converting older PDFs using Acrobat 5.0 and adding the accessible
tagging might be a first step, but then, will any of that work be
preserved by the conversion programs?

Engraved stone tablets are probably more accessible than most end of
line PDF files - one can feel the letters; the download time is
horrendous though.

Terence de Giere

To subscribe, unsubscribe, or view list archives,
visit http://www.webaim.org/discussion/