WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Correct PDF/UA tagging structure for Indexes ?

for

From: Philip Kiff
Date: Jan 2, 2024 1:45PM


Like Duff says, that's a good question!

In the absence of other practical suggestions from the list, here's my
first crack at a tag tree showing how I would probably go about it if I
were scripting an automated method to generate PDF tags from a
well-structured source text:

<H2>
Index
<H3>
A
<L>
<LI>
<Lbl>
Artichokes
<LBody>
<L>
<LI>
<LBody>
<Link>
3
,
<LI>
<LBody>
<Link>
15-38
,
<LI>
<LBody>
<Link>
133
<L>
<LI>
<Lbl>
cooking
<LBody>
<L>
<LI>
<LBody>
<Link>
16
,
<LI>
<LBody>
<Link>
31
,
<LI>
<LBody>
<Link>
32
,
<LI>
<LBody>
<Link>
33
<LI>
<Lbl>
in oil
<LBody>
<L>
<LI>
<LBody>
<Link>
31
,
<LI>
<LBody>
<Link>
32
<LI>
<Lbl>
in butter
....
<LI>
<Lbl>
Artischocken
....
<h3>
B
....

The general idea in the tag tree above is simply to tag each letter as a
heading and then use nested lists for the terms within each letter.

The case of the first item "Artichokes" is tricky because in my tagging
it has *TWO* nested lists at the same level, which may confuse some
readers. The first list is page numbers for Artichokes generally, and
the second list is the list of artichoke sub-categories. Page numbers
are then nested in a further sub-list within each sub-category item.

I considered inserting a visually hidden label like "General" for the
fist set of page numbers that refer to Artichokes generally, but I don't
think that would necessarily improve usability for actual users, and it
would make the tagging even more complicated than it already is.

I also considered flattening the list somewhat by inserting a visually
hidden phrase in parentheses "(artichokes)" before each sub-category of
artichokes. But I don't think this would make it easier to browse the
list with a screen reader and might just get in the way for some users.

To simplify the tagging, you might decide against nesting the page
numbers within their own lists and simply put all the page numbers for
an item within a single Paragraph tag with page numbers separated by
commas. The extra list level I'm using may be overkill. Though if any
entries include a dozen or more page numbers (which would be common in
an academic text) then putting the page numbers inside their own nested
list would be helpful for some users trying to make sense of a long list
of references. Screen reader software will announce the number of list
items before entering the list, and this allows a user to decide whether
to skip over the list or attempt to navigate it one item at a time - or
to use a different strategy to navigate it.

Having said all that, I've never seen a long index actually marked up
like this. And indeed, I've never personally marked up an index like
this manually either. (It would take hours and hours!) So you would want
to test your output sample with screen reader and other users before
deciding on the final format if you are integrating it into software
that generates PDF output automatically.

The vast majority of indexes that I've seen in the wild aren't even
marked up as lists, and page numbers in most indexes I've seen aren't
even linked to the actual references they cite.

Phil.

Philip Kiff
D4K Communications

On 2024-01-02 12:05 p.m., Rick Davies via WebAIM-Forum wrote:
> Hello all,
>
> The estimable PDF Association has explanatory technical notes about
> how to conform to the
> PDF/UA standard. But those documents don't contain many illustrative
> examples and *no*
> illustrative examples about index markup--I guess they are not
> intended as cookbooks. The
> PDF Association also has several helpful example PDF/UA documents
> contributed by third
> parties--none of them contain indexes. After weeks of searching, I
> have not been able to
> find much or anything about PDF/UA index markup: no credibly tagged
> PDF/UA documents with
> indexes. The old Adobe PDF 1.6 document has a magnificent index, but
> no Index tagging.
> All the successor ISO PDF standard documents don't have any indexes 🙁.
>
> So I'm wondering a) if anyone knows where such examples may be found
> and b) how should a
> developer of a PDF output generator produce PDF/UA tagging for the
> following index structure?
> (The objective is that the PDF output-generator should create 'born
> accessible' PDFs,
> without the need for any remediation.)
>
> Rough and ready index example:
> > Index
> >
> ___
> A
> ___
>
> Artichokes                  3, 15-38, 133
>     cooking                16, 31, 32, 33
>     in oil                         31, 32
>     in butter                      31, 33
>     growing
>     hyponetically                   24-27
>     au naturelle                    20-21
>
> Artischocken                   66, 70, 90
>     Kochen                     80, 81, 92
>
> Avocados                       55-58, 133
>
> ...
> ___
> B
> ___
>
> Bananas 65, 65
>
> >
> In the above 'A' and 'B' are index sub-headings, aka group titles. And
> the 'au naturelle',
> 'Artischocken' and 'Kochen' represent index entries in different
> languages.
>
> What would be the complete PDF/UA tagging required to express the
> above index structure,
> including page links? Is there a tagging structure for the above that
> would be compatible
> with all of UA-1, UA-2, PDF 1.7 and PDF 2 ?
>
> OTOH this mailing list "is for anyone interested in discussing *web*
> accessibility issues" so
> perhaps the above question would be better asked somewhere else? The
> PDF Association does
> not seem to have a forum ...
>
> All suggestions and comments very gratefully received 🙂.
>
> Many thanks,
>
> Rick
>