WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Whither /TOC and /TOCI in PDF 2.0? PDF accessibility question.

for

Number of posts in this thread: 4 (In chronological order)

From: Karlen Communications
Date: Tue, May 14 2019 12:33PM
Subject: Whither /TOC and /TOCI in PDF 2.0? PDF accessibility question.
No previous message | Next message →

My understanding is that you can still use the "old" Tag Set like TOC and
TOCI but it is going to be mapped to something else like a paragraph, list
or just a link. There will be no clear distinction in the Tags as to whether
you are in a TOC or a paragraph or generic list.

As far as I know, the use of PDF 1.7 is not specifically identified in PDF -
2 but I may be wrong. I keep hearing that you can use both PDF 1.7 Tags and
PDF - 2 Tags or either and still be conforming.

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Rick
Davies
Sent: Tuesday, May 14, 2019 1:38 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: [WebAIM] [WebAim] Whither /TOC and /TOCI in PDF 2.0? PDF
accessibility question.

Hello Everyone,

This is a question about accessibility and the PDF 2.0 standard.

The PDF 2.0 standard, ISO 32000-2:2017(E), says that /TOC and /TOCI, and
several other standard structure elements, are no longer 'defined'. (Annex
M, page 958.)

OTOH there are many US Goverment agency documents containing accessibility
guidelines saying, e.g., entries in tables of contents *must* be tagged
/TOCI.
(Table of contents entries in the PDF version of ISO 32000-2:2017(E) are
tagged /TOCI. That's presumably because it's a PDF 1.7 document.)

The complete list of previously-defined standard structure elements,
*undefined* by PDF 2.0, is: Sect, Art, BlockQuote, TOC, TOCI, Index,
NonStruct, Private, Quote, Note, Reference, BibEntry, Code.

Does anyone know why these standard structure elements were 'undefined' in
PDF 2.0?
What is the significance of their removal/undefining/defenestration in PDF
2.0?
And what should be used instead?

If the PDF version of ISO 32000-2:2017(E) PDF 2.0 standard were to be in
Version 2.0 PDF instead of version 1.7 PDF, would the table the table of
contents entries still be tagged /TOCI, or tagged some other way?

The answers must be obvious, but I can't find 'em. If anyone has time to
explain this, it would be wonderful, especially if the explanation could be
made at a level suitable for someone who finds understanding the various ISO
PDF standards woefully difficult.

Or maybe PDF 2.0 is ignoreable for, say, the next ten years in terms of
accessibility, & most other things??

Thanks very much ...

Rick

http://webaim.org/discussion/archives

From: chagnon@pubcom.com
Date: Tue, May 14 2019 1:32PM
Subject: Re: [WebAim] Whither /TOC and /TOCI in PDF 2.0? PDF accessibility question.
← Previous message | Next message →

Speaking as a member of the PDF Association (trade association for the PDF
file format) and a US delegate to the PDF and PDF/UA ISO standards
committees. (Whew! No wonder I'm tired!)

Since these standards are copyrighted by the ISO, please understand that I
can only talk about items that are currently published and not those in
development.

This is a detailed reply about PDF and PDF/UA.

The short summary is at the end: search for "Summary."

Let's clarify all these PDF numbers and standards so you can understand what
they control and who they're intended for.

There's the main PDF standard (ISO 32000) that covers everything about all
kinds of PDF files. PDFs are used for hundreds of different uses - beyond
what we talk about in this forum which are document files that must be
accessible. The world uses engineering PDFs, archival PDFs, legal PDFs,
press-printing PDFs, architectural PDFs, technical drawing PDFs, accounting
PDFs, and any kind of PDF they can dream up. (This is why the PDF file
format isn't going away anytime soon, regardless of whether you like them or
not.)

PDF ISO 32000 was originally published as PDF 1.7 and is now at version 2.0
(what Rick referred to as "PDF 2.0"). The technical standard number is ISO
32000-2 which was released in August 2017. It is published by the ISO at
https://www.iso.org/standard/63534.html

PDF 2.0 doesn't say much about accessibility, other than, if I recall
correctly, a couple of references to the PDF/UA Universal Access standard.

There are several related PDF sub-sets of the main standard for specific
types of PDFs.

For us in accessibility, we use PDF/UA - Universal Access (ISO 14289
https://www.iso.org/standard/64599.html) which defines what is needed to
make a PDF that conforms to the PDF/UA standard and can be accessed by
various computer technologies, including (but not limited to) assistive
technologies.

There are other PDF standards like those for PDF/A (archival documents),
PDF/E (engineering documents), PDF/X (printing and graphics exchange
documents) and more. You can learn about these different standards at the
PDF Association's website, https://www.pdfa.org/resource/

Today's ISO's standards are not written for us who create content, remediate
PDFs, or export PDFs from MS Office and other programs. Instead, the
standards provide software engineers the programming details they need to
program their software to create standards-conforming PDFs. Some companies:
Adobe, Microsoft, Nuance, FoxIt, Oracle, Quicken, Intuit, iText, Axes4, and
other PDF-generating software companies.

For us, the standards as written are hard to interpret and put into
practice. The PDF Association has some helpful resources about putting the
various standards into practice. And then you have everyone's reading of the
ISO tea leaves, and their interpretation and application of the standards,
including my own, Karen McCall's, and HHS's.

Summary:

1.
PDF 2.0 = ISO 32000-2 released in August 2017.
It is only now beginning to be deployed by software manufacturers, and I
don't know of any assistive technology manufacturers who are deploying it at
this time because it doesn't define accessibility at all. It's a global
standard for all PDFs, not specifically for accessible PDFs.
Don't even look PDF 2.0. It's not what you need.

2.
PDF/UA = ISO 14289 for Universal Access, released in 2014.
Version 1 PDF/UA-1 is the current standard that describes how to make
accessible PDFs (remember, it's from a programmer's viewpoint but there is
some substance for content creators and remediators).

3.
PDF/UA-2 (version 2) is in development and not yet released by the ISO. It
will be a while before it's completed, accepted by the ISO, published by the
ISO, and then formally adopted by the Access Board under Sec. 508. It will
take a few years for all these steps to be completed.

4.
Sec. 508 regulation states WCAG 2.0 and PDF/UA-1 are the appropriate
standards for accessible U S Federal I C T. It takes a while for the Access
Board to go through the formal federal process of updating
regulations...years most likely! So the accessibility community will be
working with WCAG 2.0 and PDF/UA-1 for the foreseeable future.

5.
Don't worry about PDF 2.0 & PDF/UA-2.
They are not in play at this time, and they are not required by Sec. 508.
Follow the current PDF 1.7 & PDF/UA-1, not PDF/UA-2.

6.
Rick, your questions (excellent questions!) are being addressed by the
committee. And that's all I can say.
PDF 2.0 and PDF/UA-2 are very different in their approach to PDFs and
accessibility. It's the foundation for the PDF of the future. The committees
are aware that there needs to be a smooth transition from PDF 1.7 & PDF/UA-1
and PDF 2.0 & PDF/UA-2.

7.
Rick wrote: "Or maybe PDF 2.0 is ignoreable for, say, the next ten years in
terms of accessibility, & most other things??"
Yes! But sooner than 10 years. My crystal ball says 5 years for
accessibility, and sooner for other advanced PDF technologies.

Most likely we'll see a slow phase-in throughout our industry, starting with
Adobe and Microsoft giving us new tools to create documents to the new
standards, new utilities that convert PDFs to the new standards, new
checkers that check to the new standards, and new A T that can work with
PDFs made to both the current PDF/UA-1 and future PDF/UA-2 standards.

Hope this helps!

-Bevi

- - -
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
- - -
PubCom: Technologists for Accessible Design + Publishing
consulting . training . development . design . sec. 508 services
Upcoming classes at www.PubCom.com/classes
- - -
Latest blog-newsletter - Accessibility Tips at www.PubCom.com/blog

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Rick
Davies
Sent: Tuesday, May 14, 2019 1:38 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: [WebAIM] [WebAim] Whither /TOC and /TOCI in PDF 2.0? PDF
accessibility question.

Hello Everyone,

This is a question about accessibility and the PDF 2.0 standard.

The PDF 2.0 standard, ISO 32000-2:2017(E), says that /TOC and /TOCI, and
several other standard structure elements, are no longer 'defined'. (Annex
M, page 958.)

OTOH there are many US Goverment agency documents containing accessibility
guidelines saying, e.g., entries in tables of contents *must* be tagged
/TOCI.
(Table of contents entries in the PDF version of ISO 32000-2:2017(E) are
tagged /TOCI. That's presumably because it's a PDF 1.7 document.)

The complete list of previously-defined standard structure elements,
*undefined* by PDF 2.0, is: Sect, Art, BlockQuote, TOC, TOCI, Index,
NonStruct, Private, Quote, Note, Reference, BibEntry, Code.

Does anyone know why these standard structure elements were 'undefined' in
PDF 2.0?
What is the significance of their removal/undefining/defenestration in PDF
2.0?
And what should be used instead?

If the PDF version of ISO 32000-2:2017(E) PDF 2.0 standard were to be in
Version 2.0 PDF instead of version 1.7 PDF, would the table the table of
contents entries still be tagged /TOCI, or tagged some other way?

The answers must be obvious, but I can't find 'em. If anyone has time to
explain this, it would be wonderful, especially if the explanation could be
made at a level suitable for someone who finds understanding the various ISO
PDF standards woefully difficult.

Or maybe PDF 2.0 is ignoreable for, say, the next ten years in terms of
accessibility, & most other things??

Thanks very much ...

Rick

From: Duff Johnson
Date: Tue, May 14 2019 1:59PM
Subject: Re: [WebAim] Whither /TOC and /TOCI in PDF 2.0? PDF accessibility question.
← Previous message | Next message →

Hi Rick,

Bevi’s answer was excellent. I’ll provide a little more detail… probably just enough to make you wish you’d never asked! :-)

> This is a question about accessibility and the PDF 2.0 standard.
>
> The PDF 2.0 standard, ISO 32000-2:2017(E), says that /TOC and /TOCI, and several
> other standard structure elements, are no longer 'defined'. (Annex M, page 958.)

We understand that the text at the start of Annex M can be confusing; we will attempt to address this in the forthcoming dated revision of ISO 32000-2.

You are correct that these elements are not defined directly in ISO 32000-2. Instead ISO 32000-2 references ISO 32000-1 to define these elements (and indeed, uses them as the default elements in PDF 2.0).

> OTOH there are many US Goverment agency documents containing accessibility
> guidelines saying, e.g., entries in tables of contents *must* be tagged /TOCI.
> (Table of contents entries in the PDF version of ISO 32000-2:2017(E) are tagged
> /TOCI. That's presumably because it's a PDF 1.7 document.)

Correct.

> The complete list of previously-defined standard structure elements, *undefined*
> by PDF 2.0, is: Sect, Art, Bl≠ockQuote, TOC, TOCI, Index, NonStruct, Private, Quote,
> Note, Reference, BibEntry, Code.

To be slightly amended in the forthcoming “dated revision" of PDF 2.0, but yes.

> Does anyone know why these standard structure elements were 'undefined' in PDF 2.0?
> What is the significance of their removal/undefining/defenestration in PDF 2.0?


PDF 2.0 introduces namespaces to facilitate the use of rich tagsets (DAISY, MathML, etc.) in a PDF context. In this context, the ISO WG decided to simplify the “base” PDF 2.0 tagset while providing clear containment rules for those elements.

PDF 1.7 elements (along with their own containment rules, such as they are) continue to exist. As stated above, they are actually the default in PDF 2.0.

As a technical matter (since you are reading the spec), this aspect is covered in ISO 32000-2:2017, 14.8.6 “Standard structure namespaces”.

> And what should be used instead?

The change actually make possible the use of richer 3rd party-originated tagsets in PDF semantic structures that aren’t defined in PDF 2.0 (including the 1.7 set)

Since 1.7 tags are the default, you do not have to “use something else instead" - you are free to use PDF 1.7 structure element types in PDF 2.0 files. PDF/UA-2 will (likely) require that these elements be mapped to PDF 2.0 elements, but this does not imply any loss of information.

> If the PDF version of ISO 32000-2:2017(E) PDF 2.0 standard were to be in Version 2.0
> PDF instead of version 1.7 PDF, would the table the table of contents entries still
> be tagged /TOCI, or tagged some other way?

It would be up to the author. They could use a TOC/TOCI model (PDF 1.7); they could also use list (L/LI) elements (PDF 2.0), or they could have both (PDF 1.7 elements mapped to PDF 2.0 elements). PDF 2.0 takes no position on this point.

> The answers must be obvious, but I can't find ‘em.

It’s insufficiently obvious, and this is one of the committee's regrets. Some are now discussing a separate document to explain precisely this subject (in less technical terms!).

> If anyone has time to explain
> this, it would be wonderful, especially if the explanation could be made at a
> level suitable for someone who finds understanding the various ISO PDF standards
> woefully difficult.

Very reasonable ask, and thank you.

As Bevi mentioned, these documents (the ISO standards defining PDF and PDF subsets) are written for PDF software developers, and generally do not include advice for end users. Hopefully the above explanation is of some help.

> Or maybe PDF 2.0 is ignoreable for, say, the next ten years in terms of accessibility,
> & most other things??

PDF 2.0 files are beginning to appear in the wild, but I’ve yet to see tooling for PDF 2.0’s tagged PDF features directed towards authors (would be very happy to learn of an example!).

PDF/UA-2 will certainly include explicit instructions for using PDF 1.7 structure elements in a PDF 2.0 context.

Duff.

From: Rick Davies
Date: Thu, May 16 2019 11:21AM
Subject: Re: Whither /TOC and /TOCI in PDF 2.0? PDF accessibility question.
← Previous message | No next message

Thank you, Karen, Bevi and Duff for your very gracious replies. Hugely appreciated.

My question was from the view point of a software developer trying to do the right
thing in the context of producing PDF/UA compliant, and optimally accessible tagged
PDF output, automagically. Our objective is to remove the need for post-production
PDF remediation. It seems to take so much user-time. Passing compliance tests is
not so difficult. The big challenge we have is getting it right in a way that will
make users happy.

I was wondering *why* the /TOC and TOCI standard structure elements were 'undefined'
in PDF 2.0. (Just like in a crossword puzzle--each word found helps to find other
words.)

I have to go back to school on this, reading and reading over and over, fortified by your kind
responses.

Thanks again,

Rick

--
= EMAIL ADDRESS REMOVED =
Rick Davies, Technical Sales Manager
Datazone Ltd, Tel: +353 64 66 289 64
Palm Gate, Greenane, Killarney, Fax: +353 64 66 289 65
Co. Kerry, Ireland www.miramo.com

On 5/14/2019 7:33 PM, Karlen Communications wrote:
> My understanding is that you can still use the "old" Tag Set like TOC and
> TOCI but it is going to be mapped to something else like a paragraph, list
> or just a link. There will be no clear distinction in the Tags as to whether
> you are in a TOC or a paragraph or generic list.
>
> As far as I know, the use of PDF 1.7 is not specifically identified in PDF -
> 2 but I may be wrong. I keep hearing that you can use both PDF 1.7 Tags and
> PDF - 2 Tags or either and still be conforming.
>
> Cheers, Karen
>
> -----Original Message-----
> From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Rick
> Davies
> Sent: Tuesday, May 14, 2019 1:38 PM
> To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
> Subject: [WebAIM] [WebAim] Whither /TOC and /TOCI in PDF 2.0? PDF
> accessibility question.
>
> Hello Everyone,
>
> This is a question about accessibility and the PDF 2.0 standard.
>
> The PDF 2.0 standard, ISO 32000-2:2017(E), says that /TOC and /TOCI, and
> several other standard structure elements, are no longer 'defined'. (Annex
> M, page 958.)
>
> OTOH there are many US Goverment agency documents containing accessibility
> guidelines saying, e.g., entries in tables of contents *must* be tagged
> /TOCI.
> (Table of contents entries in the PDF version of ISO 32000-2:2017(E) are
> tagged /TOCI. That's presumably because it's a PDF 1.7 document.)
>
> The complete list of previously-defined standard structure elements,
> *undefined* by PDF 2.0, is: Sect, Art, BlockQuote, TOC, TOCI, Index,
> NonStruct, Private, Quote, Note, Reference, BibEntry, Code.
>
> Does anyone know why these standard structure elements were 'undefined' in
> PDF 2.0?
> What is the significance of their removal/undefining/defenestration in PDF
> 2.0?
> And what should be used instead?
>
> If the PDF version of ISO 32000-2:2017(E) PDF 2.0 standard were to be in
> Version 2.0 PDF instead of version 1.7 PDF, would the table the table of
> contents entries still be tagged /TOCI, or tagged some other way?
>
> The answers must be obvious, but I can't find 'em. If anyone has time to
> explain this, it would be wonderful, especially if the explanation could be
> made at a level suitable for someone who finds understanding the various ISO
> PDF standards woefully difficult.
>
> Or maybe PDF 2.0 is ignoreable for, say, the next ten years in terms of
> accessibility, & most other things??
>
> Thanks very much ...
>
> Rick
>
> > > http://webaim.org/discussion/archives
> >
> > > > >
>
>