WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Html from a .pdf file, what is the best way?

for

From: Simius Puer
Date: Apr 7, 2010 11:30AM


Hi Birkir

That's not borderline at all - it's a very valid question.

I would argue that a single HTML format for both disabled and non-disabled
would be the most accepted approach. Having more than one format creates
additional work in managing the content (unless it is managed from a single
source and output to multiple formats).

Converting PDF to HTML is a huge topic (as is the accessibility of PDFs) and
if you scan the archives on this list you should find plenty of material
covering it.

The key point from all the discussions is "it all depends on the quality and
consistency of the source document". Typically most PDFs are generated from
source documents (such as MS Word) that have little or no structural
mark-up. So converting that PDF (or indeed the source document) to HTML by
an automated tool will ultimately fail. This is not the fault of the source
format, nor the auto-converter, but the skills of the people creating the
source documentation.....rubbish-in, rubbish-out!

From experience, if this is a one-off document then would probably benefit
from a manual conversion carried out by a trained professional.

*Important caveat*: the HTML document would only be accessible as the
website via which it is made available!

One article from our website may be of interest to you:

- Convert PDF, Word (and other formats) to accessible, semantic and lean
HTML
http://www.simiusweb.ie/document_conversion_to_xhtml.html

I'd suggest your best approach would be to find an in-house resource who
understands HTML and accessibility to convert the document manually, or to
use a bureau service provided by a company (that speaks the native language
of the document).

Best regards