WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: are OCR files bigger?

for

From: Swift, Daniel P.
Date: Feb 18, 2020 8:37AM


I just did a test of a 64 page PDF which originated from PNG files. It went from around 6.8 megs before OCR to 8.6 megs after. Obviously, mileage will vary.

Dan Swift
Senior Web Specialist
University Communications and Marketing
West Chester University
610.738.0589

From: WebAIM-Forum [mailto: <EMAIL REMOVED> ] On Behalf Of Colin Osterhout
Sent: Thursday, February 13, 2020 7:26 PM
To: WebAIM Discussion List < <EMAIL REMOVED> >
Subject: Re: [WebAIM] are OCR files bigger?

I believe it increases the size of the scan a little bit if it's simply an
invisible layer of text over the scanned results vs. vectorization of
the underlying source
<https://blogs.adobe.com/acrolaw/2009/05/better_pdf_ocr_clearscan_is_smal/<https://blogs.adobe.com/acrolaw/2009/05/better_pdf_ocr_clearscan_is_smal/>>.
Even if all that is available to the source device is the searchable image
OCR flavor the increase in functionality and accessibility would very much
be worth this nominal increase in my book by far. If these devices are to
be used in an instructional context I can't imagine a worse experience as a
student than to have to sift through pages of a scanned source searching
for words or phrases, or trying to annotate the resulting image-only PDF,
trying to copy excerpts for block quotes, etc.

On Thu, Feb 13, 2020 at 3:07 PM Lucy GRECO < <EMAIL REMOVED> <mailto: <EMAIL REMOVED> >> wrote:

> hello: i have been asked to help with an RFP for printers and scanners.and
> phodocopyers i ask to help the scanners and copyers to have the ocr feature
> turn on by default and got a big push back. the response is that ocr files
> are larger and so large that they can't be emailed and therefore would be
> brocking apart. is this true. i always thought ocr files were smaller.
> one other push back was that they are scanning spred sheets and if they
> were ocr'ed it would brake the spred sheets. lets let talk about why they
> are not leaving the spred sheets in e format i don't want to go there its
> just maddening. does any one have data on how valid this push back might
> or might not be thanks lucy
> Lucia Greco
> Web Accessibility Evangelist
> IST - Architecture, Platforms, and Integration
> University of California, Berkeley
> (510) 289-6008 skype: lucia1-greco
> http://webaccess.berkeley.edu<;http://webaccess.berkeley.edu>;
> Follow me on twitter @accessaces
> > > > >


--
Colin Osterhout
Website Coordinator, University of Alaska Southeast
<EMAIL REMOVED> <mailto: <EMAIL REMOVED> >