From: Duff Johnson
Date: Jun 14, 2022 2:58PM

Hi Christine,

The design of the Matterhorn Protocol is intentionally open-ended to allow for all manner of implementations. Most use it to underpin software development or categorize test procedures. It’s not really intended for end users in a remediation context…. except at a systematic, workflow sort of level. For example, you could use Matterhorn to verify that your software is capable of addressing each type of check necessary to a file’s conformance with PDF/UA.

Another option: some use Matterhorn to develop a precise means of identifying and categorizing specific types of errors.

It would certainly be possible to iterate through the tag tree of a PDF and validate each-and-every tag for conformance with the human check provisions. In practice, this can be done quite quickly once the user is familiar with the concept of PDF tags in general.

It’s up to the user to understand that a list item’s bullet must be contained in an <Lbl> tag, and not in the <LBody> tag (to take one common example).

Matterhorn simply allows you to classify an error (In the above case, 01-006 "The structure type and attributes of a structure element are not semantically appropriate for the structure element“) as it relates to PDF/UA-1.

I hope this information is useful.

You may also find the PDF Association’s other formal advice in this area, the Tagged PDF Best Practice Guide: Syntax, to be useful.



> On Jun 14, 2022, at 12:45, Christine Hogenkamp < <EMAIL REMOVED> > wrote:
> Hello everyone,
> I am working on creating a how-to doc for my organization for how to create
> a PDF that passes PDF/UA, which includes a section on how to check the PDF
> to ensure it has passed. I have downloaded a copy of the Matterhorn
> Protocols which was recommended as a way to help users interpret the PDF/UA
> standards doc which is, in itself, a fairly complex technical document. The
> problem I am having is that I can't seem to find instructions on how this
> checklist should be used.
> For example, this article:
> https://www.pdfa.org/climbing-the-matterhorn-an-introduction-to-the-definitive-algorithm-for-pdfua-conformance/
> "How to Use the Matterhorn Protocol
> The basic approach for implementing the Matterhorn Protocol is to map the
> Failure Conditions to the various tasks implied by the specific
> word-processing, content extraction or other context."
> Perhaps I am just obtuse, but I am having a hard time parsing what this
> actually means in real life terms. How does one "map the failure
> conditions"? Am I meant to open the accessible PDF in question in Acrobat,
> then read through the Matterhorn Protocols while I go through my PDF to
> manually check each Human checklist item to see if I can find any instances
> of failure in the PDF, using the Tags panel? Then use PAC3 to check the
> Machine checklist items?
> It would be very helpful if anyone could point me towards more specific
> step by step instructions for using the Matterhorn Protocols.
> Thank you in advance!
> > > >