WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF and searchable text for scanned documents

for

From: Steve Green
Date: Sep 29, 2020 9:31AM


I have encountered this several times, but I do not know what causes it. We use the axesPDF QuickFix tool to view and modify the mapping between the glyphs and the underlying Unicode characters, but we usually only need to fix one or two incorrect mappings. I guess you could go through all the mappings for all the fonts and replace the Unicode characters with the ones you want, but that sounds like a lot of work. There may be other ways to do it more efficiently.

Remember that Acrobat's Accessibility Check is only doing a very small number of very simple tests. Passing the test tells you almost nothing about the document's accessibility, other than it is probably not as terrible as it might have been.

What application was the document authored in?

Steve Green
Managing Director
Test Partners Ltd


-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Jackson, Derek J
Sent: 29 September 2020 15:24
To: <EMAIL REMOVED>
Subject: [WebAIM] PDF and searchable text for scanned documents

Hello,

I have a remediated scanned document and it passes Adobe's Accessibility Check and PAC3. However the underlying text does not correspond to the visible text. For example the content container for a paragraph contains text like " =X's6- H -R, $E F I A*'a" that corresponds to an area on the PDF that is unrelated to the paragraph. However all of the paragraph tags use the "Actual Text" field to provide the actual text of the paragraph. The consequence is that a screen reader will read the paragraph correctly but the document is not searchable, and copy and paste is not practical. So I am wondering if this is an instance where we have a document that meets the accessibility requirements but still it is not functionally accessible or is there something in PDF/UA that addresses this issue? I have looked through the PDF/UA spec and am not seeing anything but I readily admit that some of the technical jargon and details are beyond me.

Thanks for the continued help!
Derek

—

Derek Jackson

Digital Accessibility Developer | Digital Accessibility Services Harvard University Information Technology
1430 Massachusetts Ave, 4th Floor
Cambridge, MA 02138

he/him/his