E-mail List Archives

You are here: Home > Community > E-mail List Archives > View Message

From: Philip Kiff
Date: Feb 3, 2020 4:56AM

Next message: R.U. Steinberg: "Re: Acrobat and JavaScript question"
Previous message: Philip Kiff: "Re: Generating PDF's from HTML"
Next message in Thread: R.U. Steinberg: "Re: Acrobat and JavaScript question"
Previous message in Thread: R.U. Steinberg: "Acrobat and JavaScript question"
View all messages in this Thread

I don't have a JavaScript suggestion for your problem.

To pull out just a plain text copy of the tagged content, however, you
might try the "Preview" function of the PDF Accessibility Checker (PAC)
tool:
https://pdf-aktuell.ch/pa/language/en/pdf-accessibility-checker-version-3-pac-3/

This tool has a function that allows you to "preview" a simplified
version of what the tagged content as it might be heard by a screen
reader user. There is a checkmark that allows you to display that
preview with or without semantic structures. And it appears that you can
copy and paste the screen output of that tool.

Alternatively, you might try converting the PDF to HTML, first. There is
a (currently free) Acrobat plug-in from Callas software called pdfGoHTML:
https://www.callassoftware.com/en/products/pdfgohtml
I *think* that this software focuses on the tagged portions of the PDF,
rather than on the visual "content" that appears in the various objects.
I might be wrong, though.

Phil.

Philip Kiff
D4K Communications

On 2020-01-31 22:34, R.U. Steinberg wrote:
> I have searched acrobatusers.com, adobe.com, and other sites and come up
> empty. I have several 20 page PDF files that were created by a third party
> designer.
>
> It was bad enough that it was exported from Adobe InDesign without tags and
> I had to add them manually. Now I've learned that some of the tags have
> spelling errors in them. Visually, a word may look like "Certificate" but
> if you were to listen to it with JAWS, the word is pronounced "Certifcate"
> (missing a letter "i") because the word in the tag is spelled wrong.
>
> I know I can manually copy the contents of each tag in the tag tree one at
> a time (right click on a paragraph in the tag tree and select "Copy
> Contents to Clipboard"), paste into something like MS Word and run a spell
> check. Then once I find a spelling error I can use "Actual Text" field in
> the tag properties to spell words correctly. I also know a little
> JavaScript and have tried to make sense of the Adobe manuals, but have had
> no luck. I'm hoping that everything in the tag tree is some sort of
> "object" but can't find the reference in the manuals.
>
> What I'd like is some sort of a batch process that can run through the
> whole file and export the content of all the tags into plain text or
> something like that. Not export the PDF as plain text, but the content in
> the tags as plain text. I hope that makes sense.
> > > >

Next message: R.U. Steinberg: "Re: Acrobat and JavaScript question"
Previous message: Philip Kiff: "Re: Generating PDF's from HTML"
Next message in Thread: R.U. Steinberg: "Re: Acrobat and JavaScript question"
Previous message in Thread: R.U. Steinberg: "Acrobat and JavaScript question"
View all messages in this Thread