E-mail List Archives
Re: Fixing OCR issues in PDF with Adobe Acrobat Pro
From: Philip Kiff
Date: May 15, 2021 8:04AM
- Next message: Philip Kiff: "Re: Fixing OCR issues in PDF with Adobe Acrobat Pro"
- Previous message: Jennison Mark Asuncion: "the 10th GAAD is May 20"
- Next message in Thread: Philip Kiff: "Re: Fixing OCR issues in PDF with Adobe Acrobat Pro"
- Previous message in Thread: Karen McCall: "Re: Fixing OCR issues in PDF with Adobe Acrobat Pro"
- View all messages in this Thread
Just a quick follow-up on the Adobe Acrobat Pro DC interface for OCR. I
found a file that I edited last year, and Acrobat Pro does seem to allow
editing the way I remember?
When I open this scanned PDF and have the original image displayed (i.e.
not replacement font but an exact copy of the original image), I can
open up the Scan & OCR Tool, and then select "Recognize Text" in the
toolbar, and there is a checkbox "Review recognized text" that appears
on the left in the sub-toolbar menu that opens below it. When I select
that, initially only suspects appear editable even though I've selected
the checkbox - the suspects are surrounded in red boxes. But on that
screen I can then double click randomly on a piece of text and it will
allow me to change the interpreted text for that snippet by editing the
"image ... recognized as ... "entry for that newly selected box?
It is for sure a terrible interface. And it does not actually seem like
you can edit text. I had to flip back and forth between several pages
before it started to work. But you can edit the text that way - or at
least I can in this PDF?
My interface looks similar to what I see under step 2 under "How to
correct OCR errors" on this page from OneLegal (about whom I know
nothing, but whose page I just found now because they happen to have
instructions that seem to match what I'm seeing):
https://www.onelegal.com/blog/how-to-correct-ocr-errors-using-adobe-acrobat/
Phil.
On 2021-05-15 08:52, Philip Kiff wrote:
> I haven't worked on a challengin OCR'd PDF in a year or two, but I
> could have sworn there was a way to get to a mode that would allow you
> to edit *any* of the OCR'd text, not just the suspect text without
> switching to a replacement font. The interface was terrible and the
> way to switch from editing suspect text to editing any text was not at
> all obvious. Mmmm....I can't find a sample of a case where I did this,
> so maybe I'm mis-remembering, and I actually used the "actual text"
> property - which you already indicated wouldn't meet your needs.
>
> I've never tried the other methods you propose. And yes, it does seem
> that Acrobat has an entirely other set of hidden object layer it uses
> to manage OCR'd text. And I don't think axesPDF QuickFix provides any
> access to it, either.
>
> Phil.
>
> On 2021-05-14 19:26, Jonathan Avila wrote:
>> Hi all, I still have not found a great way within Acrobat to address
>> optical character recognition (OCR) errors. The situation is that
>> the text was incorrectly recognized but Acrobat does not perceive the
>> issues as suspect and thus the tools typically in Acrobat to fix OCR
>> suspects are not available. I'm not sure if there is a way to flag
>> the content as suspect somehow - but it seems silly to not allow you
>> to edit any of the OCR text unless it's a suspect.
>>
>> OCR'd content appears to have hidden objects that represent the text
>> for the tags structure but this text is not editable itself. While
>> Acrobat does have an edit text option in the last couple versions
>> that does a good job in allowing you to edit the visual content in a
>> type face that looks like OCR'd text - I am dealing with a document
>> that can't be edited in that way for legal reasons.  I need to edit
>> the hidden text.
>>
>> In addition, hacks like use of actual text don't work with mobile
>> devices so using that approach is not an option. The only way I have
>> found is to artifact the object and create a new text box - but the
>> text in that and hide it behind the image. That does work across
>> desktop and mobile assistive technology.
>>
>> I also played with the preflight option to make OCR text into
>> layers. It does a good job converting the OCR text into a different
>> layer that can be edited. The challenge is then merging or
>> flattening the layers back into one. When I try that I either lose
>> the content in all the tags or I end up with duplicated text on
>> screen even though I have chosen to not display the layer and mark
>> the layer as a reference layer. Has anyone had luck with this method?
>>
>> Does anyone have any thoughts on how best to edit OCR text in Acrobat
>> when you cannot edit the visual text and OCR suspects are not
>> detected?  I've tried Axes Quick for PDF but it doesn't seem to have
>> any options for this either. I believe some programs like Abbyy Fine
>> Reader could be used but my license for that is very old.
>>
>> Best Regards,
>>
>> Jonathan
>> >> >> >> > > > >
- Next message: Philip Kiff: "Re: Fixing OCR issues in PDF with Adobe Acrobat Pro"
- Previous message: Jennison Mark Asuncion: "the 10th GAAD is May 20"
- Next message in Thread: Philip Kiff: "Re: Fixing OCR issues in PDF with Adobe Acrobat Pro"
- Previous message in Thread: Karen McCall: "Re: Fixing OCR issues in PDF with Adobe Acrobat Pro"
- View all messages in this Thread