E-mail List Archives

You are here: Home > Community > E-mail List Archives > View Message

From: Duff Johnson
Date: Jan 2, 2014 3:06PM

Next message: Olaf Drümmer: "Re: JWAS and special characters pronunciation"
Previous message: Chagnon | PubCom: "Re: JWAS and special characters pronunciation"
Next message in Thread: Olaf Drümmer: "Re: JWAS and special characters pronunciation"
Previous message in Thread: Chagnon | PubCom: "Re: JWAS and special characters pronunciation"
View all messages in this Thread

> Olaf wrote: "the main reason probably is that HTML needs to be able to
> distinguish between ordered and unordered list, in order to create the
> proper bullets or numbering...In PDF, all is said and done in this regard -
> the bullets or numbers or whatever are already part of the page content..."
>
> I understand, but I still don't think this justifies having 2 different sets
> of tags for different file formats.

Recall that HTML (certainly, prior to HTML 5) is simply dumbed-down SGML, a language that *is* (unlike HTML) eminently capable of representing most any sort of textual STEM content.

When people use SGML, the end-product (ironically enough) is usually a PDF.

Another way to look at it is this: Since there's a lot of information in the world that is and will continue to be represented in ways other than as web-pages, as a consequence, there are more than one important formats AT developers need to support.

The big, wide ever-morphing world of HTML/CSS/JavaScript is clearly one; no-one says otherwise. The specific, largely static and otherwise highly constrained world of Tagged PDF, however, is clearly another.

PDF works for STEM publishing precisely because it's completely flexible in terms of representation and highly flexible (if presently less well supported by most AT) in terms of semantics.

It is not a hit on HTML to point out that it's not infrequently inadequate (or the browsers are inadequate, take your pick) for STEM publishing needs. Getting away from such complexities was (and remains) part of HTML's original charm vs. SGML.

> Does either set of code provide a better experience for AT users?
> When reading a PDF, would screen reader users like to hear "bulleted list"
> and "numbered list" rather than just "list" and a bunch of label jibberish
> that often isn't voiced?

When we are about choosing to support various formats we are talking about technical aspirations. In this context, I'd say that what AT users need, in principle, is to be able to receive the content the author provided.

That content may be plain text or it may be rich in one or more of many ways - that's simply impossible to circumscribe (especially when we're discussing STEM content).

> If I want authors, editors, and designers who create content to change their
> behavior and make accessible documents, why tell them to use <UL> and <OL>
> when they're making the HTML version of the document, and <L> when they're
> creating a PDF? This makes it less easy, more confusing to the average
> writer.

This sort of question - resolving tags over here vs. attributes over there - is for implementers to solve. If authoring software was done right it would be transparent to the users who don't want to know what's going on under the hood.

> Considering that the people who create these documents must create a
> bazillion of them every day in all sorts of file formats, it's better to
> streamline the standards and have all formats use the same set of tags for
> accessibility. Since HTML was addressing accessibility standards long before
> everyone else, they set the standard. It doesn't do any good to ignore that
> standard later down the road and create a different set of rules for PDF.

In my view that's equivalent to stating that all documents should be HTML. Even if it was true in theory (and there are good arguments against), it's not what we see in practice. The world appears to need PDF - it uses the stuff more and more, as Google Trends continues to make clear...

http://duff-johnson.com/2013/02/22/apparently-pdf-isnt-boring/

> I teach several thousand people a year how to make accessible documents. At
> some point in the training, every student asks why it's the <L> tag in PDF
> while everywhere else it's <UL> and <OL>.
>
> I don't have a good explanation for them.

As Olaf pointed out, check out the definition of the ListNumbering attribute (Table 347) in ISO 32000. PDF enables a rich expression of list labels.

I'm not saying it's perfect, and I'm not saying AT supports this sort of thing today. But if we're asking what "should" AT developers do, I think the answer's pretty clear they "should" go ahead and fully support PDF instead of pretending the world will all of a sudden decide that a final-form document is somehow no longer important. Really, how likely is that?

On the other hand, we already see major API developers providing advanced support for tagged PDF and PDF/UA, more desktop applications to help author and consume tagged PDF. PDF 2.0, which we should hopefully see in 2015, will lay the bedrock for advanced implementations utilizing MathML and much more.

It's certainly true that not all implementations follow desirable standards. Ask the vendors for better, and get others to do likewise! That's precisely how this stuff changes.

Duff Johnson

p +1.617.283.4226
e <EMAIL REMOVED>
w http://duff-johnson.com

Next message: Olaf Drümmer: "Re: JWAS and special characters pronunciation"
Previous message: Chagnon | PubCom: "Re: JWAS and special characters pronunciation"
Next message in Thread: Olaf Drümmer: "Re: JWAS and special characters pronunciation"
Previous message in Thread: Chagnon | PubCom: "Re: JWAS and special characters pronunciation"
View all messages in this Thread