WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: are OCR files bigger?

for

Number of posts in this thread: 4 (In chronological order)

From: Lucy GRECO
Date: Thu, Feb 13 2020 5:07PM
Subject: are OCR files bigger?
No previous message | Next message →

hello: i have been asked to help with an RFP for printers and scanners.and
phodocopyers i ask to help the scanners and copyers to have the ocr feature
turn on by default and got a big push back. the response is that ocr files
are larger and so large that they can't be emailed and therefore would be
brocking apart. is this true. i always thought ocr files were smaller.
one other push back was that they are scanning spred sheets and if they
were ocr'ed it would brake the spred sheets. lets let talk about why they
are not leaving the spred sheets in e format i don't want to go there its
just maddening. does any one have data on how valid this push back might
or might not be thanks lucy
Lucia Greco
Web Accessibility Evangelist
IST - Architecture, Platforms, and Integration
University of California, Berkeley
(510) 289-6008 skype: lucia1-greco
http://webaccess.berkeley.edu
Follow me on twitter @accessaces

From: Colin Osterhout
Date: Thu, Feb 13 2020 5:26PM
Subject: Re: are OCR files bigger?
← Previous message | Next message →

I believe it increases the size of the scan a little bit if it's simply an
invisible layer of text over the scanned results vs. vectorization of
the underlying source
<https://blogs.adobe.com/acrolaw/2009/05/better_pdf_ocr_clearscan_is_smal/>.
Even if all that is available to the source device is the searchable image
OCR flavor the increase in functionality and accessibility would very much
be worth this nominal increase in my book by far. If these devices are to
be used in an instructional context I can't imagine a worse experience as a
student than to have to sift through pages of a scanned source searching
for words or phrases, or trying to annotate the resulting image-only PDF,
trying to copy excerpts for block quotes, etc.

On Thu, Feb 13, 2020 at 3:07 PM Lucy GRECO < = EMAIL ADDRESS REMOVED = > wrote:

> hello: i have been asked to help with an RFP for printers and scanners.and
> phodocopyers i ask to help the scanners and copyers to have the ocr feature
> turn on by default and got a big push back. the response is that ocr files
> are larger and so large that they can't be emailed and therefore would be
> brocking apart. is this true. i always thought ocr files were smaller.
> one other push back was that they are scanning spred sheets and if they
> were ocr'ed it would brake the spred sheets. lets let talk about why they
> are not leaving the spred sheets in e format i don't want to go there its
> just maddening. does any one have data on how valid this push back might
> or might not be thanks lucy
> Lucia Greco
> Web Accessibility Evangelist
> IST - Architecture, Platforms, and Integration
> University of California, Berkeley
> (510) 289-6008 skype: lucia1-greco
> http://webaccess.berkeley.edu
> Follow me on twitter @accessaces
> > > > >


--
Colin Osterhout
Website Coordinator, University of Alaska Southeast
= EMAIL ADDRESS REMOVED =

From: Swift, Daniel P.
Date: Tue, Feb 18 2020 8:37AM
Subject: Re: are OCR files bigger?
← Previous message | Next message →

I just did a test of a 64 page PDF which originated from PNG files. It went from around 6.8 megs before OCR to 8.6 megs after. Obviously, mileage will vary.

Dan Swift
Senior Web Specialist
University Communications and Marketing
West Chester University
610.738.0589

From: WebAIM-Forum [mailto: = EMAIL ADDRESS REMOVED = ] On Behalf Of Colin Osterhout
Sent: Thursday, February 13, 2020 7:26 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] are OCR files bigger?

I believe it increases the size of the scan a little bit if it's simply an
invisible layer of text over the scanned results vs. vectorization of
the underlying source
<https://blogs.adobe.com/acrolaw/2009/05/better_pdf_ocr_clearscan_is_smal/<https://blogs.adobe.com/acrolaw/2009/05/better_pdf_ocr_clearscan_is_smal/>>.
Even if all that is available to the source device is the searchable image
OCR flavor the increase in functionality and accessibility would very much
be worth this nominal increase in my book by far. If these devices are to
be used in an instructional context I can't imagine a worse experience as a
student than to have to sift through pages of a scanned source searching
for words or phrases, or trying to annotate the resulting image-only PDF,
trying to copy excerpts for block quotes, etc.

On Thu, Feb 13, 2020 at 3:07 PM Lucy GRECO < = EMAIL ADDRESS REMOVED = <mailto: = EMAIL ADDRESS REMOVED = >> wrote:

> hello: i have been asked to help with an RFP for printers and scanners.and
> phodocopyers i ask to help the scanners and copyers to have the ocr feature
> turn on by default and got a big push back. the response is that ocr files
> are larger and so large that they can't be emailed and therefore would be
> brocking apart. is this true. i always thought ocr files were smaller.
> one other push back was that they are scanning spred sheets and if they
> were ocr'ed it would brake the spred sheets. lets let talk about why they
> are not leaving the spred sheets in e format i don't want to go there its
> just maddening. does any one have data on how valid this push back might
> or might not be thanks lucy
> Lucia Greco
> Web Accessibility Evangelist
> IST - Architecture, Platforms, and Integration
> University of California, Berkeley
> (510) 289-6008 skype: lucia1-greco
> http://webaccess.berkeley.edu<;http://webaccess.berkeley.edu>;
> Follow me on twitter @accessaces
> > > > >


--
Colin Osterhout
Website Coordinator, University of Alaska Southeast
= EMAIL ADDRESS REMOVED = <mailto: = EMAIL ADDRESS REMOVED = >

From: Duff Johnson
Date: Tue, Feb 18 2020 8:57AM
Subject: Re: are OCR files bigger?
← Previous message | No next message

> hello: i have been asked to help with an RFP for printers and scanners.and
> phodocopyers i ask to help the scanners and copyers to have the ocr feature
> turn on by default and got a big push back. the response is that ocr files
> are larger and so large that they can't be emailed and therefore would be
> brocking apart. is this true.

It is NOT generically true.

OCR processes vary, as does image-handling post-OCR. Some OCR processes can result in a reduced file-size (e.g., if JBIG2 compression is used, or formatted text replaces the image), some can result in large sizes for a variety of reasons (usually relating to other-than-ideal OCR software).

If files are expanding massively in size (like, by more than 10-20%) then the choice of software or software settings is almost always to blame…. not the mere fact of performing OCR.

> i always thought ocr files were smaller.

It all depends on software choices and settings.

> one other push back was that they are scanning spred sheets and if they
> were ocr'ed it would brake the spred sheets.

Well, scanned spreadsheets aren’t getting any more accessible if they are left as raster images without OCR!

Yes… it might mean significant work to make a scanned spreadsheet accessible.. but it’s certainly not impossible.

Duff.