WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF and searchable text for scanned documents

for

From: Jonathan Avila
Date: Sep 29, 2020 11:21AM


Seems like this is an incorrect use of the actualText property.

Jonathan

-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Jackson, Derek J
Sent: Tuesday, September 29, 2020 10:24 AM
To: <EMAIL REMOVED>
Subject: [WebAIM] PDF and searchable text for scanned documents

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hello,

I have a remediated scanned document and it passes Adobe's Accessibility Check and PAC3. However the underlying text does not correspond to the visible text. For example the content container for a paragraph contains text like " =X's6- H -R, $E F I A*'a" that corresponds to an area on the PDF that is unrelated to the paragraph. However all of the paragraph tags use the "Actual Text" field to provide the actual text of the paragraph. The consequence is that a screen reader will read the paragraph correctly but the document is not searchable, and copy and paste is not practical. So I am wondering if this is an instance where we have a document that meets the accessibility requirements but still it is not functionally accessible or is there something in PDF/UA that addresses this issue? I have looked through the PDF/UA spec and am not seeing anything but I readily admit that some of the technical jargon and details are beyond me.

Thanks for the continued help!
Derek

—

Derek Jackson

Digital Accessibility Developer | Digital Accessibility Services Harvard University Information Technology
1430 Massachusetts Ave, 4th Floor
Cambridge, MA 02138

he/him/his