WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: PDF and searchable text for scanned documents


From: Jackson, Derek J
Date: Sep 29, 2020 11:41AM

I thought the same Jonathan, I understood the actualText to be used for very small amounts of text and not entire paragraphs. But I could have arrived at that assumption just from my own experience and not what the actualText property requirements are.

Steve, maybe what I have is not a technical error but something that a manual check should reveal as an accessibility error? I cannot share the document because it is not mine to distribute but thank you for the offer.

Thank you again,

On 9/29/20, 1:21 PM, "WebAIM-Forum on behalf of Jonathan Avila" < <EMAIL REMOVED> on behalf of <EMAIL REMOVED> > wrote:

Seems like this is an incorrect use of the actualText property.


-----Original Message-----
From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Jackson, Derek J
Sent: Tuesday, September 29, 2020 10:24 AM
Subject: [WebAIM] PDF and searchable text for scanned documents

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


I have a remediated scanned document and it passes Adobe's Accessibility Check and PAC3. However the underlying text does not correspond to the visible text. For example the content container for a paragraph contains text like " =X's6- H -R, $E F I A*'a" that corresponds to an area on the PDF that is unrelated to the paragraph. However all of the paragraph tags use the "Actual Text" field to provide the actual text of the paragraph. The consequence is that a screen reader will read the paragraph correctly but the document is not searchable, and copy and paste is not practical. So I am wondering if this is an instance where we have a document that meets the accessibility requirements but still it is not functionally accessible or is there something in PDF/UA that addresses this issue? I have looked through the PDF/UA spec and am not seeing anything but I readily admit that some of the technical jargon and details are beyond me.

Thanks for the continued help!


Derek Jackson

Digital Accessibility Developer | Digital Accessibility Services Harvard University Information Technology
1430 Massachusetts Ave, 4th Floor
Cambridge, MA 02138