WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Fixing OCR issues in PDF with Adobe Acrobat Pro

for

From: Philip Kiff
Date: May 15, 2021 6:52AM


I haven't worked on a challengin OCR'd PDF in a year or two, but I could
have sworn there was a way to get to a mode that would allow you to edit
*any* of the OCR'd text, not just the suspect text without switching to
a replacement font. The interface was terrible and the way to switch
from editing suspect text to editing any text was not at all obvious.
Mmmm....I can't find a sample of a case where I did this, so maybe I'm
mis-remembering, and I actually used the "actual text" property - which
you already indicated wouldn't meet your needs.

I've never tried the other methods you propose. And yes, it does seem
that Acrobat has an entirely other set of hidden object layer it uses to
manage OCR'd text. And I don't think axesPDF QuickFix provides any
access to it, either.

Phil.

On 2021-05-14 19:26, Jonathan Avila wrote:
> Hi all, I still have not found a great way within Acrobat to address optical character recognition (OCR) errors. The situation is that the text was incorrectly recognized but Acrobat does not perceive the issues as suspect and thus the tools typically in Acrobat to fix OCR suspects are not available. I'm not sure if there is a way to flag the content as suspect somehow - but it seems silly to not allow you to edit any of the OCR text unless it's a suspect.
>
> OCR'd content appears to have hidden objects that represent the text for the tags structure but this text is not editable itself. While Acrobat does have an edit text option in the last couple versions that does a good job in allowing you to edit the visual content in a type face that looks like OCR'd text - I am dealing with a document that can't be edited in that way for legal reasons. I need to edit the hidden text.
>
> In addition, hacks like use of actual text don't work with mobile devices so using that approach is not an option. The only way I have found is to artifact the object and create a new text box - but the text in that and hide it behind the image. That does work across desktop and mobile assistive technology.
>
> I also played with the preflight option to make OCR text into layers. It does a good job converting the OCR text into a different layer that can be edited. The challenge is then merging or flattening the layers back into one. When I try that I either lose the content in all the tags or I end up with duplicated text on screen even though I have chosen to not display the layer and mark the layer as a reference layer. Has anyone had luck with this method?
>
> Does anyone have any thoughts on how best to edit OCR text in Acrobat when you cannot edit the visual text and OCR suspects are not detected? I've tried Axes Quick for PDF but it doesn't seem to have any options for this either. I believe some programs like Abbyy Fine Reader could be used but my license for that is very old.
>
> Best Regards,
>
> Jonathan
> > > >