WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Html from a .pdf file, what is the best way?

for

From: Eoin Campbell
Date: Apr 8, 2010 3:27AM


Since its 2,000 pages, the best thing you can do to ensure accessibility
is to have a shorter executive summary that people might actually read.

You should also ensure that there is an internal search facility limited
to the document itself, so that people can quickly search through it to
find bits of relevance to them.
The document should be broken into reasonably sized chunks too,
probably to the section level within chapters.

You should ask for the original source, which must be something like
QuarkXPress or InDesign, I imagine, as this is probably much easier to convert
to structured HTML than PDF or plain text.

Depending on your budget (do you have a budget?), there are companies
and products that attempt to convert PDF into structured formats, but
not sure if they would handle the Icelandic characters and language.

Perhaps a manually formatted accessible summary, and a searchable but less
accessible full-text set of document sections would be a reasonable
compromise.

You should be cautious about the plain-text version. If it is generated
from the PDF, the text might not be in the correct reading order.


Birkir Gunnarsson wrote:
> I apologize if this question is borderline topic.
> There is a big government report being published in a couple of weeks in my
> home country, over 2000 pages, but one which will interest a lot of people.
> I was contacted this morning and asked what would be the best way to make
> its contents accessible to our blind/VI users.
> They have it as plain text and as a series of .pdf files.
> I believe a .pdf file of this size (each of them over 150 pges) may cause
> problems with Adobe reader accessibility, not unless the buffer is set to 30
> pages or less (please correct me if I am wrong here).
> Also, if there is a link on page 2 in that document that refers to page,
> say, 120, what happens with the Adobe reader in this case,. Assume the
> reader clicks on the link, will the reader load page 120 and the followign
> 30 pages into a buffer?
> I am just not sure if .pdf is a good format, I am not sure if the .txt
> format is good either, since it does not allow for any textlinks and it is
> an awfully large document.
> But, assuming I get to the person with the source document, how hard is it
> to export it to a marked up html (headings etc)?
> I would think that be ideal format for a very lrge document in many
> sections, for a blind user.
> If anyone has an opinion on this it would be most appreciated.

--
Eoin Campbell
<EMAIL REMOVED>