WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Fixing PDF OCR Errors

for

Number of posts in this thread: 4 (In chronological order)

From: Joseph Sherman
Date: Wed, Aug 15 2018 11:08AM
Subject: Fixing PDF OCR Errors
No previous message | Next message →

What's the easiest was to fix PDF OCR Errors? For example, I have a signed one page legal memo that was scanned in after signed. I ran OCR and Acrobat says there are no OCR suspects. But a couple of places the letter O was recognized as the number 0.

I tried going into the tag tree and changing the actual and alt text for that content, which seemed to work when I read it with JAWS. It there a different way I should be doing this?


Joseph

From: Duff Johnson
Date: Wed, Aug 15 2018 12:22PM
Subject: Re: Fixing PDF OCR Errors
← Previous message | Next message →

> What's the easiest was to fix PDF OCR Errors? For example, I have a signed one page legal memo that was scanned in after signed. I ran OCR and Acrobat says there are no OCR suspects. But a couple of places the letter O was recognized as the number 0.

Recognition errors are a common problem. The answer really depends on your needs.

If you just want it correct, then why not simply edit the text in the output PDF, replacing the O with the 0?

If you need to preserve the document's appearance exactly as-converted, or can't edit the text due to a limitation of software (for example), then using ActualText on a <Span> element that encloses just the offending character(s) is a fine solution.

> I tried going into the tag tree and changing the actual and alt text for that content, which seemed to work when I read it with JAWS. It there a different way I should be doing this?


In these cases, use ActualText, not Alt.

Duff.

From: chagnon@pubcom.com
Date: Wed, Aug 15 2018 12:43PM
Subject: Re: Fixing PDF OCR Errors
← Previous message | Next message →

Adobe has some good tutorials on this.
http://blogs.adobe.com/acrolaw/2016/03/correcting-ocr-errors/

Keep in mind that you're working with two items:
1) the actual scanned image, which is a graphic of the text, and
2) Acrobat's interpretation of the graphical text as live, editable text.

You need to correct the second, Acrobat's interpretation of the graphical
text. Acrobat will show you possible candidates of what it wasn't sure were
correct interpretations, but it's a bit more work to correct what Acrobat
thinks is correct, such as zeros instead of capital letter O's.

-Bevi

- - -
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
- - -
PubCom: Technologists for Accessible Design + Publishing
consulting . training . development . design . sec. 508 services
Upcoming classes at www.PubCom.com/classes
- - -
Latest blog-newsletter - Accessibility Tips at www.PubCom.com/blog

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of
Joseph Sherman
Sent: Wednesday, August 15, 2018 1:09 PM
To: 'WebAIM Discussion List' < = EMAIL ADDRESS REMOVED = >
Subject: [WebAIM] Fixing PDF OCR Errors

What's the easiest was to fix PDF OCR Errors? For example, I have a signed
one page legal memo that was scanned in after signed. I ran OCR and Acrobat
says there are no OCR suspects. But a couple of places the letter O was
recognized as the number 0.

I tried going into the tag tree and changing the actual and alt text for
that content, which seemed to work when I read it with JAWS. It there a
different way I should be doing this?


Joseph

http://webaim.org/discussion/archives

From: Karlen Communications
Date: Wed, Aug 15 2018 1:12PM
Subject: Re: Fixing PDF OCR Errors
← Previous message | No next message

This is why I have ABBYY PDF Transformer...aside from its cool name.

It is only $79 USD and I can open a PDF in it and it gives me better OCR
than Adobe acrobat. Since Acrobat X, I've never been able to get the "find
suspects" or find mistakes to work with the Adobe Text Recognition tool. It
even says that there are no errors but when I Tag the document, there are no
spaces between words.

I took the same document to PDF Transformer, did the OCR, saved it as PDF
again, opened it in Acrobat and was able to give it correct Tags with no OCR
errors and there were spaces between the words.

I can also open a PDF document in PDF Transformer and send it to Word for
easier reading when I have untagged PDF or poorly tagged PDF.

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of
Joseph Sherman
Sent: Wednesday, August 15, 2018 1:09 PM
To: 'WebAIM Discussion List' < = EMAIL ADDRESS REMOVED = >
Subject: [WebAIM] Fixing PDF OCR Errors

What's the easiest was to fix PDF OCR Errors? For example, I have a signed
one page legal memo that was scanned in after signed. I ran OCR and Acrobat
says there are no OCR suspects. But a couple of places the letter O was
recognized as the number 0.

I tried going into the tag tree and changing the actual and alt text for
that content, which seemed to work when I read it with JAWS. It there a
different way I should be doing this?


Joseph

http://webaim.org/discussion/archives