WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: [WebAim] Whither /TOC and /TOCI in PDF 2.0? PDF accessibility question.

for

From: Duff Johnson
Date: May 14, 2019 1:59PM


Hi Rick,

Bevi's answer was excellent. I'll provide a little more detail… probably just enough to make you wish you'd never asked! :-)

> This is a question about accessibility and the PDF 2.0 standard.
>
> The PDF 2.0 standard, ISO 32000-2:2017(E), says that /TOC and /TOCI, and several
> other standard structure elements, are no longer 'defined'. (Annex M, page 958.)

We understand that the text at the start of Annex M can be confusing; we will attempt to address this in the forthcoming dated revision of ISO 32000-2.

You are correct that these elements are not defined directly in ISO 32000-2. Instead ISO 32000-2 references ISO 32000-1 to define these elements (and indeed, uses them as the default elements in PDF 2.0).

> OTOH there are many US Goverment agency documents containing accessibility
> guidelines saying, e.g., entries in tables of contents *must* be tagged /TOCI.
> (Table of contents entries in the PDF version of ISO 32000-2:2017(E) are tagged
> /TOCI. That's presumably because it's a PDF 1.7 document.)

Correct.

> The complete list of previously-defined standard structure elements, *undefined*
> by PDF 2.0, is: Sect, Art, Bl≠ockQuote, TOC, TOCI, Index, NonStruct, Private, Quote,
> Note, Reference, BibEntry, Code.

To be slightly amended in the forthcoming "dated revision" of PDF 2.0, but yes.

> Does anyone know why these standard structure elements were 'undefined' in PDF 2.0?
> What is the significance of their removal/undefining/defenestration in PDF 2.0?


PDF 2.0 introduces namespaces to facilitate the use of rich tagsets (DAISY, MathML, etc.) in a PDF context. In this context, the ISO WG decided to simplify the "base" PDF 2.0 tagset while providing clear containment rules for those elements.

PDF 1.7 elements (along with their own containment rules, such as they are) continue to exist. As stated above, they are actually the default in PDF 2.0.

As a technical matter (since you are reading the spec), this aspect is covered in ISO 32000-2:2017, 14.8.6 "Standard structure namespaces".

> And what should be used instead?

The change actually make possible the use of richer 3rd party-originated tagsets in PDF semantic structures that aren't defined in PDF 2.0 (including the 1.7 set)

Since 1.7 tags are the default, you do not have to "use something else instead" - you are free to use PDF 1.7 structure element types in PDF 2.0 files. PDF/UA-2 will (likely) require that these elements be mapped to PDF 2.0 elements, but this does not imply any loss of information.

> If the PDF version of ISO 32000-2:2017(E) PDF 2.0 standard were to be in Version 2.0
> PDF instead of version 1.7 PDF, would the table the table of contents entries still
> be tagged /TOCI, or tagged some other way?

It would be up to the author. They could use a TOC/TOCI model (PDF 1.7); they could also use list (L/LI) elements (PDF 2.0), or they could have both (PDF 1.7 elements mapped to PDF 2.0 elements). PDF 2.0 takes no position on this point.

> The answers must be obvious, but I can't find ‘em.

It's insufficiently obvious, and this is one of the committee's regrets. Some are now discussing a separate document to explain precisely this subject (in less technical terms!).

> If anyone has time to explain
> this, it would be wonderful, especially if the explanation could be made at a
> level suitable for someone who finds understanding the various ISO PDF standards
> woefully difficult.

Very reasonable ask, and thank you.

As Bevi mentioned, these documents (the ISO standards defining PDF and PDF subsets) are written for PDF software developers, and generally do not include advice for end users. Hopefully the above explanation is of some help.

> Or maybe PDF 2.0 is ignoreable for, say, the next ten years in terms of accessibility,
> & most other things??

PDF 2.0 files are beginning to appear in the wild, but I've yet to see tooling for PDF 2.0's tagged PDF features directed towards authors (would be very happy to learn of an example!).

PDF/UA-2 will certainly include explicit instructions for using PDF 1.7 structure elements in a PDF 2.0 context.

Duff.