WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Fixing PDF OCR Errors

for

From: Duff Johnson
Date: Aug 15, 2018 12:22PM


> What's the easiest was to fix PDF OCR Errors? For example, I have a signed one page legal memo that was scanned in after signed. I ran OCR and Acrobat says there are no OCR suspects. But a couple of places the letter O was recognized as the number 0.

Recognition errors are a common problem. The answer really depends on your needs.

If you just want it correct, then why not simply edit the text in the output PDF, replacing the O with the 0?

If you need to preserve the document's appearance exactly as-converted, or can't edit the text due to a limitation of software (for example), then using ActualText on a <Span> element that encloses just the offending character(s) is a fine solution.

> I tried going into the tag tree and changing the actual and alt text for that content, which seemed to work when I read it with JAWS. It there a different way I should be doing this?


In these cases, use ActualText, not Alt.

Duff.