WebAIM - Web Accessibility In Mind

E-mail List Archives

Is it True: No PDFs generated from LaTeX are Accessibly Tagged?

for

From: Brandon Keith Biggs
Date: Apr 22, 2018 4:12AM


Hello,

I think I just stumbled over one of the most egregious causes of
inaccessible PDFs on the web and I hope I'm missing something:

Pdflatex does not generate accessible PDFs.

This means that every PDF generated using the LaTeX compilers is not
properly tagged. It also means that by default, all the PDFs from pandoc
are not properly tagged. This has staggering implications in the academic
community where most PDFs are created with LaTeX.

I really hope I am wrong, but I have been using both pandoc and pdflatex
directly for the last day, trying to get one heading to show. It seems to
be impossible.

Before I go and make issues on the different LaTeX distribution sites, is
there anyone who knows more about this? If it is true that there is no way
to create accessibly tagged PDFs from pdflatex, I would love either a guide
that describes the code required for proper tagging of pDFs or someone
knowledgeable to comment on the issues that I open.

Here are a few sources that make me fear this problem has not been
addressed yet:

https://umij.wordpress.com/2016/08/11/the-sad-state-of-pdf-accessibility-of-latex-documents/



https://tex.stackexchange.com/questions/261537/a-guide-on-how-to-produce-accessible-pdf-files

http://tug.org/pipermail/accessibility/2016q4/000005.html

https://chi2014.acm.org/authors/generate-a-tagged-pdf#LaTeX



A source that shows the staggering problem this has on the academic
community:

https://www.cs.cmu.edu/~jbigham/pubs/pdfs/2015/accessibleconferences.pdf



Just to speak to the implications this has had on my life:

Over the last 4 months I have been doing a literature review. I have saved
64 articles in PDF. 21 were completely unreadable, so I had to OCR them
with Kurzweil 1000 (a $1000 piece of technology, and I still couldn't read
any tables or math). Out of the others, only 4 were properly tagged.

None of the PDFs were scanned, they all had the text there,
itjustlookedlikethis.

This means that if any blind person ever wishes to read academic papers,
they are required to have an OCR program on their computer, just to read
PDFs that were probably generated with pdflatex.

The worst part is, people just don't realize how terrible this PDF problem
is. Even my research group and wife, who know better, don't quite
understand that even if their LaTeX is perfect, or their Markdown is
perfect, their only conversion tool is broken. How can it be broken?
Everyone uses it!

If this single tool gets fixed, the percentage of inaccessible to
accessible PDFs being produced will switch over night.

Thanks,


Brandon Keith Biggs <http://brandonkeithbiggs.com/>;