Re: PDF/UA question about <Figure> / <Caption> hierarchy (tagged pdf)


From: Duff Johnson
Date: May 31, 2019 11:15AM

Hi Rick,

> This has got my head in a spin: is it best practice for a figure or image
> with a caption to have a tagged PDF structure wherein the <Caption> structure
> element is a sub-element of the <Figure> structure element? Or should they
> both be at the same level? (Leaving <Div> grouping aside for now.)

That's a painful subject. The short version…

PDF 1.7 does not prohibit <Caption> enclosed by <Figure>.

BUT no current-generation AT (that I'm aware of) supports this construct; they stop when they encounter alternative text (i.e, on the <Figure>), and process no deeper :-(.

PDF 2.0 explicitly allows <Caption> enclosed by <Figure>.

> 'Tagged PDF Best Practice Guide'
> https://www.pdfa.org/wp-content/uploads/2015/12/StructureElementsBestPracticeGuide_2016-01-19.pdf

Funny you should mention it: this document's complete replacement will be published in early June! I'll announce it here when it does...

> gets near to providing the answer, but then veers off (page 6). Presumably
> because ISO 32000-2 itself does not seem to 100% clear on this. FWIW my reading
> of ISO 32000-2 is that <Caption> *should* be a child of <Figure>. But I can't
> find any confirmatory text or examples.

The guide to which you refer is actually about PDF 1.7, not PDF 2.0. In the forthcoming guide (which is also specific to PDF 1.7) the text says (shown in bold):

PDF 1.7 does not specify a mechanism to associate <Figure> structure elements with their <Caption> structure elements, or associate multiple figures together, or apply a caption to multiple figures.
In the context of <Figure> structure elements, it is recommended to locate the <Caption> structure element following the <Figure> structure element, as this practice ensures a reasonable context for the <Caption> is provided to users of relatively basic consumption software.

The new guide also provides this nugget (shown in bold):

PDF 2.0 updates the description of <Caption> as follows:

For lists and tables, a <Caption< structure element may be used as defined for the <L> (list) and <Table> structure elements. In addition, a <Caption> may be used for a structure element or several structure elements.

A structure element is understood to be "captioned" when a <Caption> structure element exists as an immediate child of that structure element. The <Caption> shall be the first or the last structure element inside its parent structure element. The number of captions cannot exceed 1.

While captions are often used with figures or formulas, they may be associated with any type of content.

> Checking the tagging in the PDF version of ISO 32000-2, it appears that neither
> table titles nor figure captions are tagged as <Caption>, which I naively expected
> should be the case. Instead they are tagged as <P>, at the same level either
> preceding or following the table or figure. Is this by best practice design, or
> might it be an inadvertency?

The PDF 2.0 document was produced with PDF 1.7 software… so don't look to its tagging for guidance!

> BTW, anyone know of specific, non-Adobe developer forums for discussing PDF/UA tagging?

You say "developer forums"… PDF Association members have access to internal Technical Working Groups, including one for PDF/UA. This is the group that develops the PDF Association's Best Practice Guides.

Hope this helps,