WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Math, LaTex, and accessibility

for

From: Olaf Drümmer
Date: Oct 6, 2014 1:24PM


Hi Jasper,

let me say a few things about PDF before I touch on some of the other topics.


"Math and PDF"
In principle it is possible (and has been since 2001) to encode mathematical formulae into a tagged PDF by simply using MathML tags as PDF custom tags. The problem is that
- there are very few ways to write tagged PDF with mathematical formulae included in the tagging structure
- there are very few tools to take advantage of that (MathML tags in tagged PDF) information

The situation is about to change.

Already in 2009 Ross Moore started from TeX and worked on creating tagged PDF with the mathematical formula structures embedded (see source info and link below). Ross has recently shared sample files with me which prove that this can be done successfully and in high quality - and kind of fully automated. Hand tagging a formula in a tagged PDF is not really viable, even for people with lots of time and patience.

My company, callas software in Berlin/Germany, will release PDF creation technology in early 2015, that will convert HTML with MathML in it to tagged PDF (with all math tags included in the tagging structure).

So, there some albeit few options that can be used to create tagged PDF with full mathematical formula coverage.

Now, in terms of tools that can take advantage of the extra information, there seem to be two options that are emerging. One is a project by DesignScience (the guys behind MathPlayer) and NVDA (the free screen reader project from Australia) who are working presenting mathematical formulae in a useful manner to NVDA users, which in most cases are low vision or blind users. My company, callas software, has developed an educational tool, pdfGoHTML, that converts tagged PDF to other presentation forms. In an upcoming release scheduled for later this month, pdfGoHTML will be extended by an "Easy Reader" that present tagged content such that it highlights the current content on screen, displays a plain text equivalent of that content in a separate window (text size user definable), and speaks the text using text to speech. It is available on Mac OS X and requiresAdobe Acrobat Pro v9 or newer. It always has been and continues to be free of charge. While not targeting end users who acutally have to rely on such features, it's aim is to prove that tagged PDF can be very very useful. It is our honest hope that makers of tools - whether PDF programs or assistive tools for PDF - pick up the ideas and make their own products more usable, even where it comes to complex document substructures like formulae.

The most exciting aspect of the new Easy Reader feature is that it comes with full MathML support - a special presentation mode for MathML style formulae makes it pretty easy to navigate in the most complex formulae without getting lost. This is a big advantage over other approaches, and helps users build a mental model of formulae. In future versions of the pdfGoHTML Easy Reader software this will be extended to graphical content like diagrams, maps or technical drawings, though not much will become available before summer 2015.

In order to learn more about pdfGoHTML and Easy Reader, check out www.callassoftware.com towards the end of October 2014.

So - the PDF world is far from being perfect in this context, but it is also true that it seems to be improving substantially at this moment.



Now a few words on MathML itself, and other forms of representing math. I believe it is fair to say that Presentation MathML 3.0 has become the lingua franca. It can be converted to LaTeX Math and back, and to some other representations and back. MathML is an official part of HTML5. Support for MathML does not necessarily shine in each and every browser, at the same time most browsers can be elevated to a very high degree of MathML support by taking advantage of the free MathJAX JavaScript library. So if anything is going to stay in terms of representing math in the next ten years, it is probably Presentation MathML. Now it's time for two disclaimers:

- Content MathML could be a much better approach for encoding math but as it does not also at the same time represent how to draw math it has been avoided by most implementers and users. Support for Content MathML in browsers seems to be patchy. Some of the problems go back to the fact that for the same "abstract" notation of a formula there can be several ways to draw the formula. Just think of a fraction, and of the several ways you could write it down. Especially in instructional / educational material, presentation does matter, so it seems content producers have rather taken to Presentation MathML for more control over appearance of math. And it has to be asked why people with a disability can't be asked to make use of presentational aspects to interact with a formula as anybody else. The fact that some assistive technologies do not present presentational aspects (pun intended) as far as I can tell is no excuse.

- I do not know of a readily available tool to make math in HTML pages accessible (unless you count squeezing a spoken formula or LaTeX code into an alt [alternateive text] tag accessible); for many years there used to be MathPlayer (don't worry - it's still around, but…) though it requires Internet Explorer, but does not run any longer in the most recent version of Internet Explorer. To the best of my knowledge the situation is as bad for EPUB3. Unless I am completely wrong this implies that also for HTML based content there are not that many useful accessibility options when it comes to math.


My personal prediction on this background currently is as follows:
- the scientific community will probably use LaTeX (in a way as a TeX based fashion to do Presentation MathML) for a couple of years to come; the LaTeX math expressions per se are pure text (but require mathematical knowledge and familiarity with LaTeX to be useful), and once you get the hang of it, there are reasonably easy to write and read (nobody wants to write MathML by hand! One will almost always need a tool to create MathML; at the same time, a lot of such tools are available free of charge on teh web, or can be purchased at reasonable prices, e.g. MathType from Design Science).
- many others are currently struggling with band-aid style tools and techniques, e.g. by putting the spoken text equivalent of a formula in the alt (alternative text) attribute of a tag (whether in HTML, EPUB or PDF).
- smarter approaches (creation, presentation, assistive tools, etc.) are emerging and will probably focus on MathML (LaTeX can be converted into MathML and vice versa), and be it only because it is an official and somehow well supported part of HTML5. Decent MathML support in tagged PDF will become readily available/feasible/usable within two to three years for at least some user groups. I have no estimate whether and when this might happen for MathML in HTML5.


Olaf

Sources:
- Ongoing efforts to generate “tagged PDF” using pdfTEX, Ross Moore, 2009, Mathematics Department, Macquarie University, Sydney, Australia, ross[at]maths.mq.edu.au; downloaded from http://www.tug.org/TUGboat/tb30-2/tb95moore.pdf on Oct 6, 2014



PS: I am looking for people who are interested in exchanging ideas around tagged PDF and accessible math…



On 6 Oct 2014, at 20:08, Jasper Cole < <EMAIL REMOVED> > wrote:

> WebAIM accessibility gurus,
>
> I've been working with LaTex files for online math courses. I'd like to make the documents more accessible, but I'm very uncertain about the best approach.
>
> For starters, it'd be great to generate tagged PDF documents. These may not be perfect, but at least they would be more accessible than the documents are currently. Does anyone know of any method to automatically generate these tags at the time of PDF creation? I've considered remediating the documents after they've already been created, but they're frequently modified and some of our instructors are less willing/able to add the tags themselves.
>
> I also read this previous conversation : http://webaim.org/discussion/mail_thread?thread=4042/ . It says that "teX, LaTeX and MathML are good for mathematics. PDF is completely inaccessible for STEM publications." Does that still hold true, even with modern tagged PDFs? If so, what alternative would you recommend? Should files be generated as HTML pages?
>
> Finally, I also read this conversation : http://webaim.org/discussion/mail_message?id=11637 . It says that "LaTex is presentation oriented which means semantics must be deduced from presentation," whereas "Content MathML does not have this problem." Would you agree with that statement? If so, is MathML the necessary step to provide accessibility? If that's the case, what is the procedure for conversion? From what I've seen, it would have to be a mostly manual process, but perhaps there's another way? We have over 500 documents (many of which are quite long) and getting them all accurately translated will be very challenging.
>
> I've never worked with this type of material before, so any knowledge of LaTex, MathML or Math PDFs is extremely helpful. I will definitely appreciate any feedback.
>
> Thanks so much!
> ---
> Jasper Cole
>
>
>
> > >