WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Acrobat and JavaScript question


From: R.U. Steinberg
Date: Feb 3, 2020 5:42AM

Thank you, Phil! That sounds like a good plan.

On Mon, Feb 3, 2020 at 5:56 AM Philip Kiff < <EMAIL REMOVED> > wrote:

> I don't have a JavaScript suggestion for your problem.
> To pull out just a plain text copy of the tagged content, however, you
> might try the "Preview" function of the PDF Accessibility Checker (PAC)
> tool:
> https://pdf-aktuell.ch/pa/language/en/pdf-accessibility-checker-version-3-pac-3/
> This tool has a function that allows you to "preview" a simplified
> version of what the tagged content as it might be heard by a screen
> reader user. There is a checkmark that allows you to display that
> preview with or without semantic structures. And it appears that you can
> copy and paste the screen output of that tool.
> Alternatively, you might try converting the PDF to HTML, first. There is
> a (currently free) Acrobat plug-in from Callas software called pdfGoHTML:
> https://www.callassoftware.com/en/products/pdfgohtml
> I *think* that this software focuses on the tagged portions of the PDF,
> rather than on the visual "content" that appears in the various objects.
> I might be wrong, though.
> Phil.
> Philip Kiff
> D4K Communications
> On 2020-01-31 22:34, R.U. Steinberg wrote:
> > I have searched acrobatusers.com, adobe.com, and other sites and
> come up
> > empty. I have several 20 page PDF files that were created by a third
> party
> > designer.
> >
> > It was bad enough that it was exported from Adobe InDesign without tags
> and
> > I had to add them manually. Now I've learned that some of the tags have
> > spelling errors in them. Visually, a word may look like "Certificate" but
> > if you were to listen to it with JAWS, the word is pronounced
> "Certifcate"
> > (missing a letter "i") because the word in the tag is spelled wrong.
> >
> > I know I can manually copy the contents of each tag in the tag tree one
> at
> > a time (right click on a paragraph in the tag tree and select "Copy
> > Contents to Clipboard"), paste into something like MS Word and run a
> spell
> > check. Then once I find a spelling error I can use "Actual Text" field in
> > the tag properties to spell words correctly. I also know a little
> > JavaScript and have tried to make sense of the Adobe manuals, but have
> had
> > no luck. I'm hoping that everything in the tag tree is some sort of
> > "object" but can't find the reference in the manuals.
> >
> > What I'd like is some sort of a batch process that can run through the
> > whole file and export the content of all the tags into plain text or
> > something like that. Not export the PDF as plain text, but the content in
> > the tags as plain text. I hope that makes sense.
> > > > > > > > > > > > >