WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF Accessibility

for

From: Terence de Giere
Date: Dec 3, 2002 7:35PM




Tim ---

I am echoing Jon Gunderson's note. The Portable Document Format (PDF) is
one step further down the chain in production, and while it has many
advantages for most users, especially for print production, I find its
accessibility and usability a lot more compromised than HTML documents.
HTML has a standardized tag set that is well supported by assistive
technology and can be accessed with any Web browser. HTML's relative
wealth of structural tags provide information to the user of special
technology missing from plain text and from documents designed to be
read by applications like Acrobat and Flash, accessibility improvements
noted.

There seems to be a dearth of accessible examples for these formerly
inaccessible technologies that now have some support in screen readers.
"For any non text element there should be a text alternative" is one of
the main pivots for providing an accessible form of information that can
be read by assistive technology. With screen reader support, PDF and
Flash now have some inroads into the accessibility arena, but the
process of finding an optimum method of doing this still seems in the
early stages. These technologies still do not work with many kinds of
technology. I browse with the Lynx text browser and these files are not
accessible at all.

For PDF, one can use the access.adobe.com service and convert the file
to HTML or text, but that forces the user to jump through hoops to get
what a simple link to a regular HTML file would provide. If a user is
using IBM Home Page Reader (HPR), this reader can be set (the default
setting) to convert a PDF file to HTML. The process IBM used is linking
to the access.adobe.com services. If the PDF file is on the hard drive,
HPR cannot convert it. PDF files are slow to load and the conversion
sometimes fails. The resulting HTML files are not especially good. A PDF
file made with the intent to be accessible successfully implemented and
a screen reader are a must for stand alone PDF files if the user wants
to directly experience the file.

I tried some commercial tools to convert PDFs to HTML; one crashed
trying to process the files, the other did a respectable job, but the
results were visually oriented, the HTML was in frames, and all the
frameset HTML (missing the required TITLE attribute for frames) was
generated using JavaScript, a sure fire formula for disaster in special
technology land, except under controlled conditions..

I think the best solution is to review the whole process of document
production in an organization, and find the best point to branch off
various format versions of documents. Retrofitting of older documents
can be assigned to those destined for Hell. Along with word processors,
many desktop publishing programs will export HTML, such as Pagemaker and
Framemaker. Will this HTML, with cleanup and testing for accessibility,
be less problematic than making a PDF version of the document accessible
or converting the PDF file to HTML or text? There are likely less
problems with mangled document structure if one gets the HTML version
from the original application rather than after processing to PDF when
this is an option.

Applications like Structured Framemaker, can use an SGML or XML
Document Type Definition to create documents, but programming is
required to create different formats, the typical user in the company
would not normally have the skill to set up the required system. But the
document could be reused with different pieces of software that support
SGML or XML. With some limitations the same XML document could be used
with a number of products, such as WordPerfect, Ventura Publisher, and
XMetaL Pro for example. With a style sheet XML could be directly on the
Web, although XML without the HTML tag set is not particularly
accessible to non visual technologies. Extensible Style Language could
transform it into XHTML and with an attached style sheet, this could go
directly on the Web.

Thus structured document production might be an internal solution for a
company but it has to be carefully worked out. For example, image
formats for printing require much higher resolution than for onscreen
use, and there needs to be a way to maintain alternate text and longer
descriptions of important images so completely automated systems that
can handle details like this are likely very expensive. Most document
handling in companies seems to grow organically in a variety of formats
depending on the initial intended use, with no thought of how all that
information might need to be connected and presented at a later date.
There is also the pyschology of the users. Structured document creation
does not seem to be a skill that intuitively comes to most users, so
training is required.

To be really accessible one always must provide an alternate form of the
document, at least for the Internet. State Farm, besides having a public
Web site also has an intranet, and I believe an extranet, or dial in to
the intranet by its agents, and here it may be possible to coordinate a
standard set of technologies that provide an accessible result without
so many options needing to be addressed, because the hardware and
software can be controlled by company policy. One needs to address the
economic issues, and in a controlled setting, perhaps accessible PDFs
and a standard requirement for a specific screen reader might be a
solution. On the other hand, downloading an HTML file is normally a lot
faster than PDF, and could be designed to download in reasonable time to
a Personal Digital Assistant as well. There is a trend in companies to
smaller devices, especially for people working in the field.

PDF is so easy to create from so many applications, its ease of use and
speed to create an electronic version of a document is a path of least
resistance, but accessible documents still require a layer of training,
human intelligence, and usually manual intervention to do the job
properly. HTML is a preferred format for accessibility because
accessibility testing tools are readily available; as are accessibility
retrofitting tools; there are online testing services, and tools that
can check usability to some extent. The accessibility tool set is much
more mature for HTML than for PDF. I feel HTML (and XHTML) is the best
format because support is closer to universal. The problem is getting
from the other formats.

We should also take into account that print formats are not as usable on
the Web visually as a well designed accessible Web page written in HTML.
Acrobat Reader can enlarge pages, and reflow text now under some
circumstances, but is still does not have as much flexibility in
rendering as HTML. After all, PDF was designed to accurately replicate
the visual layout used for printing. Designing the visual format for
ease of use onscreen requires smaller, landscape format pages similar to
a PowerPoint presentation; this seems to work well for PDFs designed
specifically for onscreen reading. The Tagged PDF document Jon mentioned
(http://cita.rehab.uiuc.edu/courses/2003-02-REHAB711NC/2003-02-Brochure.pdf)
is rendered in Acrobat in vertical pages but is composed in landscape
format. This format is fine for a document to be printed but not viewed
onscreen. The pages visually appear on their side onscreen, and one has
to tilt the head 90 degrees and scroll a lot to read it . It converts to
text well, and fairly well to HTML using Acrobat and Adobe's online
tools, but in IBM Home Page Reader, the automated PDF to HTML conversion
process failed, and the reader announced "No item on this page" even
though this is a tagged PDF file. The landscape format on vertical pages
seems to have broken the process. I was not able to try it in my screen
reader which is not currently installed. (I had a system crash a while
back and have not reinstalled everything yet. I have a certain bias to
non-screen reader special access technology, because when any one
technology is dominant, certain kinds of accessibility issues tend to
get ignored, and browsing with the 'other' technologies reveals those
weaknesses, just as browsing with a graphical browser other than
Interent Explorer now reveals problems with visual and functional
aspects of a Web site.)

It is these unexpected details that make accessibility with embedded
applications or plug-ins such a headache. PDF and Flash were not
initially designed to be very flexible in rendering or presentation but
the need to conform to Section 508 rules for software applications and
plug-ins has forced the issue for the vendors who of course want to sell
their products to the government. HTML works better because it is a text
based language, and there have been more resources spent on
accessibility solutions.

Terence de Giere
<EMAIL REMOVED>

----------------------------------------------------------
Tim Harshbarger wrote:

Subject:
PDF Accessibility
From: Tim Harshbarger < <EMAIL REMOVED> >
Date: Mon, 2 Dec 2002 09:21:27 -0600
To: <EMAIL REMOVED>

Hi,

I am putting together material for a presentation on the accessibility
of PDF documents and accessible alternatives for people here who produce
internal communications. There are a couple questions I have.

Where can I locate an example of an accessible PDF document created
using Acrobat 5.0?

Also, do PDF documents have any advantages over other formats, like
HTML? I am trying to find answers and possible replies to this question,
because I expect somebody to say something like "But we need to use PDF
because..." I already have heard some of them.--that is the reason I am
giving the presentation.

Any information or advice would be greatly appreciated.

Thanks!
Tim
Tim Harshbarger
Disability Support
State Farm Insurance Companies
Phone: (309) 766-0154
E-mail: <EMAIL REMOVED>






----
To subscribe, unsubscribe, or view list archives,
visit http://www.webaim.org/discussion/