E-mail List Archives
Number of posts in this thread: 3 (In chronological order)
From: R.U. Steinberg
Date: Jan 31, 2020 8:34PM
Subject: Acrobat and JavaScript question
No previous message | Next message → 
I have searched acrobatusers.com, adobe.com, and other sites and come up
empty. I have several 20 page PDF files that were created by a third party
designer.
It was bad enough that it was exported from Adobe InDesign without tags and
I had to add them manually. Now I've learned that some of the tags have
spelling errors in them. Visually, a word may look like "Certificate" but
if you were to listen to it with JAWS, the word is pronounced "Certifcate"
(missing a letter "i") because the word in the tag is spelled wrong.
I know I can manually copy the contents of each tag in the tag tree one at
a time (right click on a paragraph in the tag tree and select "Copy
Contents to Clipboard"), paste into something like MS Word and run a spell
check. Then once I find a spelling error I can use "Actual Text" field in
the tag properties to spell words correctly.  I also know a little
JavaScript and have tried to make sense of the Adobe manuals, but have had
no luck. I'm hoping that everything in the tag tree is some sort of
"object" but can't find the reference in the manuals.
What I'd like is some sort of a batch process that can run through the
whole file and export the content of all the tags into plain text or
something like that. Not export the PDF as plain text, but the content in
the tags as plain text. I hope that makes sense.
From: Philip Kiff
Date: Feb 3, 2020 4:56AM
Subject: Re: Acrobat and JavaScript question
← Previous message | Next message → 
I don't have a JavaScript suggestion for your problem.
To pull out just a plain text copy of the tagged content, however, you 
might try the "Preview" function of the PDF Accessibility Checker (PAC) 
tool:
https://pdf-aktuell.ch/pa/language/en/pdf-accessibility-checker-version-3-pac-3/ 
This tool has a function that allows you to "preview" a simplified 
version of what the tagged content as it might be heard by a screen 
reader user. There is a checkmark that allows you to display that 
preview with or without semantic structures. And it appears that you can 
copy and paste the screen output of that tool.
Alternatively, you might try converting the PDF to HTML, first. There is 
a (currently free) Acrobat plug-in from Callas software called pdfGoHTML:
https://www.callassoftware.com/en/products/pdfgohtml
I *think* that this software focuses on the tagged portions of the PDF, 
rather than on the visual "content" that appears in the various objects. 
I might be wrong, though.
Phil.
Philip Kiff
D4K Communications
On 2020-01-31 22:34, R.U. Steinberg wrote:
>    I have searched acrobatusers.com, adobe.com, and other sites and come up
> empty. I have several 20 page PDF files that were created by a third party
> designer.
>
> It was bad enough that it was exported from Adobe InDesign without tags and
> I had to add them manually. Now I've learned that some of the tags have
> spelling errors in them. Visually, a word may look like "Certificate" but
> if you were to listen to it with JAWS, the word is pronounced "Certifcate"
> (missing a letter "i") because the word in the tag is spelled wrong.
>
> I know I can manually copy the contents of each tag in the tag tree one at
> a time (right click on a paragraph in the tag tree and select "Copy
> Contents to Clipboard"), paste into something like MS Word and run a spell
> check. Then once I find a spelling error I can use "Actual Text" field in
> the tag properties to spell words correctly.  I also know a little
> JavaScript and have tried to make sense of the Adobe manuals, but have had
> no luck. I'm hoping that everything in the tag tree is some sort of
> "object" but can't find the reference in the manuals.
>
> What I'd like is some sort of a batch process that can run through the
> whole file and export the content of all the tags into plain text or
> something like that. Not export the PDF as plain text, but the content in
> the tags as plain text. I hope that makes sense.
> > > >
From: R.U. Steinberg
Date: Feb 3, 2020 5:42AM
Subject: Re: Acrobat and JavaScript question
← Previous message | No next message
Thank you, Phil! That sounds like a good plan.
On Mon, Feb 3, 2020 at 5:56 AM Philip Kiff < = EMAIL ADDRESS REMOVED = > wrote:
> I don't have a JavaScript suggestion for your problem.
>
> To pull out just a plain text copy of the tagged content, however, you
> might try the "Preview" function of the PDF Accessibility Checker (PAC)
> tool:
>
> https://pdf-aktuell.ch/pa/language/en/pdf-accessibility-checker-version-3-pac-3/
>
> This tool has a function that allows you to "preview" a simplified
> version of what the tagged content as it might be heard by a screen
> reader user. There is a checkmark that allows you to display that
> preview with or without semantic structures. And it appears that you can
> copy and paste the screen output of that tool.
>
> Alternatively, you might try converting the PDF to HTML, first. There is
> a (currently free) Acrobat plug-in from Callas software called pdfGoHTML:
> https://www.callassoftware.com/en/products/pdfgohtml
> I *think* that this software focuses on the tagged portions of the PDF,
> rather than on the visual "content" that appears in the various objects.
> I might be wrong, though.
>
> Phil.
>
> Philip Kiff
> D4K Communications
>
>
> On 2020-01-31 22:34, R.U. Steinberg wrote:
> >    I have searched acrobatusers.com, adobe.com, and other sites and
> come up
> > empty. I have several 20 page PDF files that were created by a third
> party
> > designer.
> >
> > It was bad enough that it was exported from Adobe InDesign without tags
> and
> > I had to add them manually. Now I've learned that some of the tags have
> > spelling errors in them. Visually, a word may look like "Certificate" but
> > if you were to listen to it with JAWS, the word is pronounced
> "Certifcate"
> > (missing a letter "i") because the word in the tag is spelled wrong.
> >
> > I know I can manually copy the contents of each tag in the tag tree one
> at
> > a time (right click on a paragraph in the tag tree and select "Copy
> > Contents to Clipboard"), paste into something like MS Word and run a
> spell
> > check. Then once I find a spelling error I can use "Actual Text" field in
> > the tag properties to spell words correctly.  I also know a little
> > JavaScript and have tried to make sense of the Adobe manuals, but have
> had
> > no luck. I'm hoping that everything in the tag tree is some sort of
> > "object" but can't find the reference in the manuals.
> >
> > What I'd like is some sort of a batch process that can run through the
> > whole file and export the content of all the tags into plain text or
> > something like that. Not export the PDF as plain text, but the content in
> > the tags as plain text. I hope that makes sense.
> > > > > > > > > > > > >
