WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF language not recognized by screenreaders

for

From: Andrew Kirkpatrick
Date: Jan 24, 2013 2:27PM


This will certainly help, but it's not quite to the point.

Gijs's point - as I understood it - was that setting the document language in Acrobat did not produce the desired result (i.e., AT using that language). This is, simply, because the document language setting in Acrobat acts on document-level metadata rather than on the content (or on the tags, for that matter).

[AWK] The result, while undesirable, is entirely consistent with the way that PDF and AT are supposed to behave though. InDesign is using a process to set the language on the top-level <document> tag, and everything underneath it uses that language unless it has a different language defined. So the child elements under the <document> tag may specify a different language, and any children of those will follow the language of their parent tag.

It works in the other direction also, and that is where the problem is - the parent of the <document> tag in the language hierarchy is the document catalog's /lang entry. When this is changed by setting the language in the document properties dialog in Acrobat the document's primary language is set to that language, but then the first tag (which contains all document content as child tags within it) redefines the language again.

My recommendation is that InDesign not use the <document> tags as the destination for the language (however determined for the document level in InDesign - I'll also be recommending a better way than currently employed) but to use the document catalog /lang entry.

Insofar as Acrobat is a "PDF editor" it would make sense to have the document-langauge management feature include the ability to (optionally) over-ride existing tag and content-level language settings.

Preferably, the user should not be forced to return to the source InDesign file to address this problem in a reasonable manner.

[AWK] Sure. I don't think that they are, even now. For example:
If I have a document with 100 paragraphs (or 1000, just picking a large number that I wouldn't want to manually and individually adjust) and I want this document to be in French but my InDesign thinks that the document is EN-US. I also have an English paragraph in my document. I make sure that the paragraph styles all reflect the correct language and export to PDF. I test and the language needs to be defined for the document, per the test in Acrobat, so I:
1) Add French as the language in the document properties dialog. This won't change how anything is read yet because the <document> tag has EN-US as the language.
2) I delete the language from the <document> tag. Now the entire document reads in French. This includes the English paragraph because InDesign was smart enough to not need to define the language for that paragraph as it thought that EN-US was the document language so it didn't need to be reassigned. All of the paragraphs which were set to French do have this indicated, which duplicates the document level setting in #1, but that's not a problem.

That seems like a process I'd like to avoid, but isn't unreasonable, as repair goes.

Are you thinking about a different use case that is more problematic?

AWK