E-mail List Archives

Re: Locking Document Content


From: Terence de Giere
Date: Dec 24, 2003 8:12AM


It would seem that just allowing the setting for assistive technology to
extract content would be the best PDF (Portable Document Format)
solution because it prevents the most easily used content extraction
methods, and only opens up a channel that is not that familiar to most
users, and not that accessible to technology for copying. One cannot
save such a PDF as text, but it can be read out as speech by assistive

The main glory for the PDF is printing, especially when the application
that produced it is not available such as a high end desktop publishing
program. Documents formatted for print are not ideal for viewing or
reading on screen. A properly constructed alternative such as an HTML
version is always more usable and accessible on screen. Note that the
World Wide Web Consortium makes their official web recommendations
available on the Internet in HTML format while often providing
alternatives such as PDF for printing. The official document version for
this organization is always the HTML version at a specific location on
their site, and their site is exposed to the full Internet.

People like creating PDF files because it is a no-brainer but it is a
clumsy usability impaired format for the web. For maximum accessibility
for screen use Adobe recommends a horizontal rectangle 4 by 3 inch
format (10.8 by 7.6 centimeters) - now how many people are going to
format their Word files in that shape? That format helps both visual and
non-visual users by keeping the content on each page fully on the screen
for the great majority of screen sizes. Even this is usually not as
readable as a good properly tagged HTML file.

While setting security on a PDF file to allow screen readers users to
access the content will still prevent copying and pasting text, screen
reader access might allow printing to an embosser into Braille because
the embosser would be operating through the screen reader technology. As
far as I have been able to find out, no visual text printing system from
screen readers has been developed. I doubt most nefariously minded users
will think of using this Braille method to get the text of a protected
PDF; after all, accessibility is still a dim bulb in most web and
organizational environments.

Even with preventing access to accessibility technology there is still a
fairly simple way to get the text out of the document short of manual
transcribing or using software to crack the security, although it does
take a bit more time and effort than copy and paste. Screen capture of
the page in Acrobat Reader is possible even if text access or screen
reader access is disallowed. With a large monitor that shows the whole
page, one can paste the capture into an application like Photoshop, and
save it in an optical character recognition (OCR) compatible image
format, and then process it with OCR software.

This is possible because the screen capture process is a function that
lies outside the Acrobat applications. The image quality of the image
text is clean so there should be few errors in converting the image to
text with OCR. In any case, either manual transcription, or screen
capture to an image and subsequent conversion to text, the person
desiring to make a fraudulent version of the document will still have to
reconstruct appearance of the original format in a word processor or
other application, and then convert that to PDF.

There is a piece of software called the Kurzweil Virtual Printer that
uses OCR to transform documents. As I have not used this technology, it
is not entirely clear from the specifications or posts on the Internet
whether its ability to read PDF files directly depends on its ability to
scan the screen and use OCR, or to use the print function in Acrobat to
get at the information by printing to an OCR compatible image format; if
the latter, electronic documents secured from printing would not be
accessible to this technology.

Security is as much a matter of trust and human behavior as it is
technology. The object is to put a sufficient number of barriers to
illegitimate use so that all but the most dedicated scoundrels will
think it is not worth the effort. Private PDF documents can be password
protected for opening, but documents for public consumption surely
cannot unless you want to set up a system for creating and distributing
individual passwords, but anyone can give away a password. If official
versions of documents are always located on a particular server,
reasonable diligence at the server can prevent most security problems.
Employees of the system just need to know that that is the only location
to get the official version.

Another method is to use the PDF e-book format Adobe owns which will
automatically provide similar protection by restricting the file to a
particular hard drive via electronic licensing. This requires Adobe's
Content Server and Acrobat Reader 6.0. Acrobat Reader 6.0 replaces the
wretched former eBook reader, which had a poor interface even for
non-disabled users, and was considered completely inaccessible.
According to Adobe if the publisher has activated the read out loud
feature, an eBook can be read out by the computer. This system is
restricted to a PC, the Macintosh with OSX, with the most recent Acrobat
version. Such files can also be read on Palm handhelds but I do not know
how much progress has been made on enabling these devices to convert
content to speech.

However, that adds another layer of hassle to get a file. I just tried
to download a sample eBook from a commercial vendor, that allowed one to
go through the permissions and transaction process with zero cost, and
the process failed. I was using the Mozilla browser and Acrobat Reader
6.0 and the process failed. I then tried Internet Explorer and the
process also failed. The web site had a pop-up window that showed in
Internet Explorer, but which was suppressed by my settings in Mozilla
but the result in either case was the sample eBook still did not
download. That was without trying to do it without a screen reader.

Information published by Adobe on the Adobe Content Server follows.

Adobe Content Server is a Web-based product for packaging and
distributing eBooks and other media. The latest version is 3.0; this
version is compatible with the Acrobat 6.0 family. Adobe Content Server
has the following capabilities:

* Encrypt PDF eBooks using Adobe DRM or PDF Merchant technology
* With Adobe DRM protection, set permissions for printing and
copying all or portions of eBooks and for reading eBooks aloud
* With Adobe DRM protection, set a fixed expiration time for an
eBook or expiration after a specified amount of reading time
* Manage information about online bookstores, libraries, and
distribution vendors
* Deploy eBook content files to servers on the Web
* Fulfill eBook vouchers, containing decryption keys and
permissions, for eBooks purchased from bookstores or lent by
online libraries
* Distribute eBooks to clients and procure eBooks from vendors who
also use Adobe Content Server

If disabled users cannot access these policies in PDF format for the
California State University System, should they be bound to follow them?
Perhaps the Chancellor could hire a live person interpreter for each
visually impaired user to read such policies to them.

To summarize, since, as has been pointed out by others on this thread,
that because the PDF security can be cracked in various ways easier to
use than through assistive technology, and a document can be
fraudulently reconstructed in various ways, then allowing assistive
technology access in a secure PDF file is not really significantly less
secure, and is considerably more time and cost effective than using more
complicated methods to restrict access. The typical avenues to transform
the content from that point on are quite restricted, ending in an active
Braille display, embossed Braille, or speech. Properly tagged HTML is
best for assistive technology, and everybody else too. The policies of a
university are not quite in the same class as military secrets or
industrial secrets. The Chancellor's office needs to relax a bit.

Terence de Giere

