E-mail List Archives

Re: Use of Headings

for

From: Duff Johnson
Date: Jul 28, 2010 4:36AM


These are interesting points, and I'm happy to provide some additional information.

On Jul 28, 2010, at 4:38 AM, Simius Puer wrote:

> Several major problems I can see:
>
> 1. You are getting someone unskilled in authoring for the web to create
> the content. Content authors either need to be educated in applying
> semantic structure to their documents, or the conversion of the material
> should be left to someone in the web team.

...or the available tools (in the example case, MS Word) simply can't do the right thing by itself, no matter who is using it.

> 2. By auto converting from Word to PDF with a source document that has no
> accessibility (I'm guessing as tables are used for layout that other
> structures are also missing - heading etc) you are ending up with an
> inaccessible PDF. The simple rule of rubbish in - rubbish out (talking
> about the quality of the mark-up/tagging, not the actual content).

No. Tables are used for layout because tables provide end-users with layout capabilities in addition to semantic-structure capabilities.

The problem is simply that software developers have yet to provide conventional facilities to allow users to distinguish layout tables from tabular data when it comes to generating PDF files. It's not hard; it's just not been done yet. (just as Word doesn't yet support table row headers).

> 3. Whilst PDFs *can be* a million times more accessible than they used to
> be (if created properly), they still don't provide the best medium for
> delivering Web content. There are plenty of discussions on that in the
> archives of this discussion list...

I don't want to re-start the "good format for the web" wars unless absolutely necessary! I'll leave it at these hopefully non-controversial points...

1) There are legitimate reasons to publish in PDF.
2) PDF provides a vehicle for making content from ANY source accessible.
3) The original question had to do with solving an accessibility problem in PDF

Also note that the problem reported is NOT specific to PDF but is in fact the artifact of an authoring tool. As such, the problem also affects Word, HTML, etc.. not just PDF.

> My suggestion would be to re-consider why you are using PDF to publish what
> sounds like Web content (as distinct from a document you simply wish to
> share over the Internet) in the first place.

On what basis does it "sound like web content"? The original question had to do with table structure - not exactly a "web content specific" issue.

> Most of the reasons people
> give for this are a little misled (I need people to be able to print it
> etc...) and other reasons like SEO have not even been considered.

I am tempted, but I'm not going there! (on this thread, anyhow)

> If you have a genuine requiremtn to publish in PDF then to get accessible
> PDFs you need to either:
>
> 1. educate your content creators into applying semantic markup and also
> applying post-conversion QA *and *cleaning up any tag soup/apply missing
> mark-up
>
> 2. have someone apply mark-up to the document professionally either pre
> or post conversion...there are pros and cons to both approaches but both are
> pretty labor intensive.

How is this advice PDF-specific? It seems the same advice that would be required for authoring accessible content from any format.

Duff Johnson
Appligent Document Solutions
http://www.appligent.com