E-mail List Archives

You are here: Home > Community > E-mail List Archives > View Thread

Number of posts in this thread: 5 (In chronological order)

From: Jonathan Metz
Date: Jul 18, 2013 8:46AM
Subject: Does pdfGoHTML not recognize Actual Text?
No previous message | Next message →

I feel like an idiot for not having done pdfGoHTML sooner. At least that
way I’d know that the first page was somehow getting hidden from
everything! I ran it and it wasn’t picking up the two culprit paragraphs.
Redoing the OCR proved successful, to a point.

It appears as though the pdfGoHTML isn’t picking up the “Actual Text” of
the tag. I tried multiple approaches. If I use Actual Text at all, the
content is completely hidden. I’ve gone so far as to test what would
happen if I just made some of the text an image and applied the Actual
Text. However, if I use Alternate text, it shows up in the conversion.

At this point I’m just trying to deduce if pdfGoHTML is having this
problem, or if the file is still screwy. I tried to see if there was a
feature request option on Callas, but I couldn’t figure out where to look
for that regarding free software. It would be cool if it replaced the
error content that is tagged with Actual Text as the actual text that’s
supposed to be read. Of course, it might still do that but I’ve still got
a bad file regardless.

Any thoughts?

Jonathan

From: Duff Johnson
Date: Jul 18, 2013 8:57AM
Subject: Re: Does pdfGoHTML not recognize Actual Text?
← Previous message | Next message →

> I feel like an idiot for not having done pdfGoHTML sooner.

:)

> At least that
> way Id know that the first page was somehow getting hidden from
> everything!

It's great for a "quick sanity check" on the file.

> I ran it and it wasnt picking up the two culprit paragraphs.
> redoing the OCR proved successful, to a point.

> It appears as though the pdfGoHTML isnt picking up the Actual Text of
> the tag. I tried multiple approaches. If I use Actual Text at all, the
> content is completely hidden. Ive gone so far as to test what would
> happen if I just made some of the text an image and applied the Actual
> Text. However, if I use Alternate text, it shows up in the conversion.

What happened when you used Acrobat to export the file to HTML, as Olaf had suggested? Compare that result to pdfGoHTML...

> At this point Im just trying to deduce if pdfGoHTML is having this
> problem, or if the file is still screwy.

Try the test I mentioned above.

Duff.

From: Olaf Drümmer
Date: Jul 18, 2013 9:36AM
Subject: Re: Does pdfGoHTML not recognize Actual Text?
← Previous message | Next message →

Hust for the record:

Am 18 Jul 2013 um 16:57 schrieb Duff Johnson < = EMAIL ADDRESS REMOVED = >:

> What happened when you used Acrobat to export the file to HTML, as Olaf had suggested? Compare that result to pdfGoHTML...

I did not suggest HTML export - I suggested 'accessible text' export. To the best of my knowledge the HTML exported in Acrobat is not useful in this context.

Olaf

From: Duff Johnson
Date: Jul 18, 2013 9:39AM
Subject: Re: Does pdfGoHTML not recognize Actual Text?
← Previous message | Next message →

> I did not suggest HTML export - I suggested 'accessible text' export. To the best of my knowledge the HTML exported in Acrobat is not useful in this context.

My mistake - thanks for the correction.

I believe the HTML export uses tags (it certainly did once upon a time), but I haven't checked the latest implementation.

Duff.

From: Olaf Drümmer
Date: Jul 18, 2013 9:39AM
Subject: Re: Does pdfGoHTML not recognize Actual Text?
← Previous message | No next message

Hi Jonathan,

the easiest way to let callas software know about issues or feature requests - whether for our commercial products or the free callas pdfGoHTML - would be an email to
= EMAIL ADDRESS REMOVED =
[we usually have a turn around time for the first substantial answer of less than 24 hours].

If you can - can you send directly to me the file you are struggling with (or a sample file that shows the ActualText issue)?

In principle ActualText should work, but maybe pdfGHoHTML is missing some aspect.

Thanks,

Olaf

Am 18 Jul 2013 um 16:46 schrieb Jonathan Metz < = EMAIL ADDRESS REMOVED = >:

> I feel like an idiot for not having done pdfGoHTML sooner. At least that
> way Id know that the first page was somehow getting hidden from
> everything! I ran it and it wasnt picking up the two culprit paragraphs.
> Redoing the OCR proved successful, to a point.
>
> It appears as though the pdfGoHTML isnt picking up the Actual Text of
> the tag. I tried multiple approaches. If I use Actual Text at all, the
> content is completely hidden. Ive gone so far as to test what would
> happen if I just made some of the text an image and applied the Actual
> Text. However, if I use Alternate text, it shows up in the conversion.
>
> At this point Im just trying to deduce if pdfGoHTML is having this
> problem, or if the file is still screwy. I tried to see if there was a
> feature request option on Callas, but I couldnt figure out where to look
> for that regarding free software. It would be cool if it replaced the
> error content that is tagged with Actual Text as the actual text thats
> supposed to be read. Of course, it might still do that but Ive still got
> a bad file regardless.
>
> Any thoughts?
>
> Jonathan
>
>
>
>
> On 7/17/13 9:56 PM, "Jonathan Metz" < = EMAIL ADDRESS REMOVED = > wrote:
>
>> Thanks for the response, Olaf.
>>
>>
>> Yes, I forgot to mention that Acrobat crashes too. I haven¹t installed
>> pdfGoHTML on this computer yet, but a good idea none the less.
>>
>> Whats the name of that other PDF reader that works with NVDA? I just can¹t
>> remember the name and want to give that a whirl.
>>
>> Should I just try OCRing that page that¹s causing me trouble and see if
>> that helps any?
>>
>> Thanks,
>> Jonathan
>>
>> On 7/17/13 6:10 PM, "Olaf Drümmer" < = EMAIL ADDRESS REMOVED = > wrote:
>>
>>> Hi Jonathan,
>>>
>>> Am 17 Jul 2013 um 20:35 schrieb Jonathan Metz
>>> < = EMAIL ADDRESS REMOVED = >:
>>>
>>>> When I use Adobe¹s Read Out Loud feature¹, Acrobat force closes.
>>>
>>> that looks like the PDF might have syntactical problems.
>>>
>>> Could you also try to
>>> - use Acrobat Pro and save as accessible text - what do you get?
>>> - use callas pdfGoHTML - what do you get?
>>>
>>> Background info: it's more or less the same engine that is working inside
>>> Adobe Reader and Adobe Acrobat for any of the above, and also for how
>>> NVDA gets access to the PDF file's content. If NVDA doesn't give you
>>> much, it might be because Adobe Reader is struggling and does not give
>>> much to NVDA to begin with.
>>>
>>> Olaf
>>>
>>> >>> >>> >>
>> >> >> >
> > >