E-mail List Archives
Does pdfGoHTML not recognize Actual Text?
From: Jonathan Metz
Date: Jul 18, 2013 8:46AM
- Next message: Duff Johnson: "Re: Does pdfGoHTML not recognize Actual Text?"
- Previous message: Jonathan Metz: "Re: NVDA, Acrobat, Reader, or User issue with OCRed PDF"
- Next message in Thread: Duff Johnson: "Re: Does pdfGoHTML not recognize Actual Text?"
- Previous message in Thread: None
- View all messages in this Thread
I feel like an idiot for not having done pdfGoHTML sooner. At least that
way I’d know that the first page was somehow getting hidden from
everything! I ran it and it wasn’t picking up the two culprit paragraphs.
Redoing the OCR proved successful, to a point.
It appears as though the pdfGoHTML isn’t picking up the “Actual Text” of
the tag. I tried multiple approaches. If I use Actual Text at all, the
content is completely hidden. I’ve gone so far as to test what would
happen if I just made some of the text an image and applied the Actual
Text. However, if I use Alternate text, it shows up in the conversion.
At this point I’m just trying to deduce if pdfGoHTML is having this
problem, or if the file is still screwy. I tried to see if there was a
feature request option on Callas, but I couldn’t figure out where to look
for that regarding free software. It would be cool if it replaced the
error content that is tagged with Actual Text as the actual text that’s
supposed to be read. Of course, it might still do that but I’ve still got
a bad file regardless.
Any thoughts?
Jonathan
- Next message: Duff Johnson: "Re: Does pdfGoHTML not recognize Actual Text?"
- Previous message: Jonathan Metz: "Re: NVDA, Acrobat, Reader, or User issue with OCRed PDF"
- Next message in Thread: Duff Johnson: "Re: Does pdfGoHTML not recognize Actual Text?"
- Previous message in Thread: None
- View all messages in this Thread