E-mail List Archives
Thread: Semantics for Indicating Accessible Version of Files
Number of posts in this thread: 16 (In chronological order)
From: Randy Pearson
Date: Tue, Jan 27 2009 9:30AM
Subject: Semantics for Indicating Accessible Version of Files
No previous message | Next message →
Hello,
We are working with a website that has myriad old PDF files that are scanned
PDF images, and obviously not accessible. Over time, we will be adding
accessible alternatives, but also keeping the originals.
Our question is, what is the proper semantic technique for indicating that
one of these files is the accessible version of the other? The site pages
currently list files in an HTML table, with one row per file (columns for
the hyperlinked file, file size, etc.). We can add the accessible file in a
new row right after the scanned file, but how should we indicate the
relationship?
Thanks in advance.
- Randy
From: ben morrison
Date: Tue, Jan 27 2009 9:40AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
On Tue, Jan 27, 2009 at 4:25 PM, Randy Pearson < = EMAIL ADDRESS REMOVED = > wrote:
> Hello,
>
> We are working with a website that has myriad old PDF files that are scanned
> PDF images, and obviously not accessible. Over time, we will be adding
> accessible alternatives, but also keeping the originals.
>
> Our question is, what is the proper semantic technique for indicating that
> one of these files is the accessible version of the other? The site pages
> currently list files in an HTML table, with one row per file (columns for
> the hyperlinked file, file size, etc.). We can add the accessible file in a
> new row right after the scanned file, but how should we indicate the
> relationship?
Using proper table markup should fix this for you.
Table caption & summary indicating there are possible accessible versions
using table headers and scope to identify columns.
Lots of good info here:
http://www.webaim.org/techniques/tables/
Ben
--
Ben Morrison
From: Patrick H. Lauke
Date: Tue, Jan 27 2009 5:35PM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
Randy Pearson wrote:
> Hello,
>
> We are working with a website that has myriad old PDF files that are scanned
> PDF images, and obviously not accessible. Over time, we will be adding
> accessible alternatives, but also keeping the originals.
>
> Our question is, what is the proper semantic technique for indicating that
> one of these files is the accessible version of the other? The site pages
> currently list files in an HTML table, with one row per file (columns for
> the hyperlinked file, file size, etc.). We can add the accessible file in a
> new row right after the scanned file, but how should we indicate the
> relationship?
Why not simply replace the inaccessible version with the accessible one?
Or is there a significant difference between the two beyond the
accessibility work that was done on them?
P
--
Patrick H. Lauke
From: Randy Pearson
Date: Wed, Jan 28 2009 6:20AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
Short answer: management decision.
Longer answer: (1) original may have legal implications, (2) original shows signature, (3) concern of whether text translation is 100% accurate, etc.
-rp
From: Moore, Michael
Date: Wed, Jan 28 2009 8:10AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
How about picture of document and document? <grin/>
From: Cliff Tyllick
Date: Wed, Jan 28 2009 8:50AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
I commiserate, because I see the same thing coming down the road for my agency.
How about "signed copy" (or "as submitted," if it's something your agency received) and "accessible version"?
If it's a page that lists many such files, how about a properly marked up table, with the two versions in separate columns?
You know what would be a really great answer? To be able to marry the image layer of one PDF with the text layer of another. That way the accessible text layer could be added to the signed image layer, and we could post just one file. But I'm not a programmer, so I have no idea what it would entail to make that possible.
>>> "Moore, Michael" < = EMAIL ADDRESS REMOVED = > 1/28/2009 9:08 AM >>>
How about picture of document and document? <grin/>
From: Moore, Michael
Date: Wed, Jan 28 2009 9:35AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
<Cliff>
You know what would be a really great answer? To be able to marry the
image layer of one PDF with the text layer of another. That way the
accessible text layer could be added to the signed image layer, and we
could post just one file. But I'm not a programmer, so I have no idea
what it would entail to make that possible.
</Cliff>
<Mike>
Cliff,
The scenario that you describe is essentially what happens when you use
Adobe professional to create an accessible document from a scanned
version. You can run OCR over the original document and generate a
tagged pdf. It is usually necessary to perform a bit of clean up on the
converted document because the OCR is not 100% accurate. Images,
signatures and other items will also need to be tagged as graphics or
artifacts. If tagged as a graphic they will require alternative text.
Artifacts act like background images in html.
</Mike>
From: Cliff Tyllick
Date: Wed, Jan 28 2009 10:15AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
Yeah, but if I already have an accessible version of the original electronic document that was used to print the copies that were signed, why can't I just scan the signed copy (or even just its signature page) and marry that image with the original text layer?
Then there would be no concerns with the accuracy of OCR. I could just tag the signatures and any other new marks with appropriate alt text.
In other words, perhaps an option under OCR could be "Copy text layer from..."
Or the main function could be "Add text layer..." and the two options beneath it could be "Run OCR" and "Copy from..."
It sounds like you're saying there wouldn't be a significant barrier to making this possible. If it were possible, then the abilities of the software would meet the needs of the workplace. No one would have to review the scanned + OCR version of a 150-page contract to ensure that the OCR hadn't mistaken a one for an "ell" or a zero for an "oh" anywhere within it.
>>> "Moore, Michael" < = EMAIL ADDRESS REMOVED = > 1/28/2009 10:33 AM >>>
<Cliff>
You know what would be a really great answer? To be able to marry the
image layer of one PDF with the text layer of another. That way the
accessible text layer could be added to the signed image layer, and we
could post just one file. But I'm not a programmer, so I have no idea
what it would entail to make that possible.
</Cliff>
<Mike>
Cliff,
The scenario that you describe is essentially what happens when you use
Adobe professional to create an accessible document from a scanned
version. You can run OCR over the original document and generate a
tagged pdf. It is usually necessary to perform a bit of clean up on the
converted document because the OCR is not 100% accurate. Images,
signatures and other items will also need to be tagged as graphics or
artifacts. If tagged as a graphic they will require alternative text.
Artifacts act like background images in html.
</Mike>
From: Moore, Michael
Date: Wed, Jan 28 2009 10:45AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
<Cliff>
Yeah, but if I already have an accessible version of the original
electronic document that was used to print the copies that were signed,
why can't I just scan the signed copy (or even just its signature page)
and marry that image with the original text layer?
</Cliff>
<Mike>
What we have done, when we have access to the original accessible
electronic document, is to scan the signed document, remove just the
signature, and paste it as an image into the accessible version, with
appropriate alt text like "Signed by President Barack Obama."
If the organization has a legal obligation to keep an original signed
copy of a document, I am not sure that a scanned version would meet
those requirements anyway. The file should probably contain a printed
paper copy, possibly even notarized.
The more frequent case is for a document like a policy notification,
announcement or PR piece from a senior manager in the organization to be
posted. In this case an image of the signature in an accessible html or
pdf is a better alternative than posting a scanned version of the
document that includes the scanned signature.
I think that many folks feel that the fully scanned version somehow
provides more legitimacy or immutability than an electronic, text based,
version with a scanned signature. These people should take short course
in photoshop and get over it.
</Mike>
From: Moore, Michael
Date: Wed, Jan 28 2009 11:00AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
<Cliff>
Good point. So now I need to get a class in Photoshop worked into the
big cheese's performance plan. (grin)
</Cliff>
<Mike>
Tell me how that works out for you. <grin/>
</Mike>
From: Cliff Tyllick
Date: Wed, Jan 28 2009 11:05AM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
<Mike>
I think that many folks feel that the fully scanned version somehow
provides more legitimacy or immutability than an electronic, text based,
version with a scanned signature. These people should take short course
in photoshop and get over it.
</Mike>
<Cliff>
Good point. So now I need to get a class in Photoshop worked into the big cheese's performance plan. (grin)
</Cliff>
>>> "Moore, Michael" < = EMAIL ADDRESS REMOVED = > 1/28/2009 11:45 AM >>>
<Cliff>
Yeah, but if I already have an accessible version of the original
electronic document that was used to print the copies that were signed,
why can't I just scan the signed copy (or even just its signature page)
and marry that image with the original text layer?
</Cliff>
<Mike>
What we have done, when we have access to the original accessible
electronic document, is to scan the signed document, remove just the
signature, and paste it as an image into the accessible version, with
appropriate alt text like "Signed by President Barack Obama."
If the organization has a legal obligation to keep an original signed
copy of a document, I am not sure that a scanned version would meet
those requirements anyway. The file should probably contain a printed
paper copy, possibly even notarized.
The more frequent case is for a document like a policy notification,
announcement or PR piece from a senior manager in the organization to be
posted. In this case an image of the signature in an accessible html or
pdf is a better alternative than posting a scanned version of the
document that includes the scanned signature.
I think that many folks feel that the fully scanned version somehow
provides more legitimacy or immutability than an electronic, text based,
version with a scanned signature. These people should take short course
in photoshop and get over it.
</Mike>
From: Randy Pearson
Date: Wed, Jan 28 2009 3:15PM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
>> If it's a page that lists many such files, how about a properly
>> marked up table, with the two versions in separate columns?
We started down that path, but then didn't like it. Hence pausing for this
post. ;) What we did not like was the table might already have 4 columns to
include date, size, type, name. If you added the accessible version to the
same row, then really you need 4 similar columns, as date, size, type apply
separately to those files also. That approach felt both structurally and
visually wrong.
>> How about "signed copy" (or "as submitted," if it's something
>> your agency received) and "accessible version"?
Good idea. We're already en route to establishing file naming conventions,
wherein one can (hopefully) grok something from the names (e.g., one might
have an extra "_text" appended to the stem of the name). But your idea
sounds like a good one. Perhaps add a "notes" column to the right that
includes this. In fact that could provide an avenue to point to the other
file also.
-- Randy
From: Cliff Tyllick
Date: Wed, Jan 28 2009 4:45PM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
Cliff asked:
How about "signed copy" (or "as submitted," if it's something your agency received) and "accessible version"?
Randy responded:
Good idea. We're already en route to establishing file naming conventions, wherein one can (hopefully) grok something from the names (e.g., one might have an extra "_text" appended to the stem of the name). But your idea sounds like a good one. Perhaps add a "notes" column to the right that includes this. In fact that could provide an avenue to point to the other
file also.
Now Cliff adds:
If you follow that route, might I suggest that you put the accessible file *first* and refer to the official, signed copy in the note? That way everyone who needs the accessible version (presumably including people using PDAs) can get to the information as quickly as possible. The "Date" column could still show the date associated with the original document. Its heading could be something like "Date Signed," for example.
>>> "Randy Pearson" < = EMAIL ADDRESS REMOVED = > 1/28/2009 4:12 PM >>>
>> If it's a page that lists many such files, how about a properly
>> marked up table, with the two versions in separate columns?
We started down that path, but then didn't like it. Hence pausing for this
post. ;) What we did not like was the table might already have 4 columns to
include date, size, type, name. If you added the accessible version to the
same row, then really you need 4 similar columns, as date, size, type apply
separately to those files also. That approach felt both structurally and
visually wrong.
>> How about "signed copy" (or "as submitted," if it's something
>> your agency received) and "accessible version"?
Good idea. We're already en route to establishing file naming conventions,
wherein one can (hopefully) grok something from the names (e.g., one might
have an extra "_text" appended to the stem of the name). But your idea
sounds like a good one. Perhaps add a "notes" column to the right that
includes this. In fact that could provide an avenue to point to the other
file also.
-- Randy
From: Chris Hoffman
Date: Wed, Jan 28 2009 5:35PM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
On Wed, Jan 28, 2009 at 5:12 PM, Randy Pearson < = EMAIL ADDRESS REMOVED = > wrote:
> What we did not like was the table might already have 4 columns to
> include date, size, type, name. If you added the accessible version to the
> same row, then really you need 4 similar columns, as date, size, type apply
> separately to those files also. That approach felt both structurally and
> visually wrong.
How about a single additional column called, e.g., "Alternate
Version", that in an accessible row contains a link to the <tr> with
the original/submitted version, and in an original/submitted row
contains a link to the <tr> with the accessible version?
Chris
From: Cliff Tyllick
Date: Thu, Jan 29 2009 12:55PM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | Next message →
A colleague suggested a different approach that is at least worth considering when the accessible file is completely reliable (that is, when its text layer wasn't obtained by OCR). In other words, this approach is not helpful if the only existing record is a scanned copy, but it could be useful for future documents---especially if you're not comfortable inserting images or don't have Photoshop.
Here's my colleague's idea: Why not post the accessible version and append to it the scanned version of only the signature page? Alt text could identify that page as "signed version of page x" (or whatever else makes sense).
If more than one page of the "official" document has significant handwritten marks, each such page could be scanned, marked with appropriate alt text, and added to the end. There would be no need to do OCR on the scanned pages, because that content would be available in the main part of the document.
You might even be able to add a section heading such as "Scanned Copies of Handwritten Notes and Signatures." That way, the images are easy to find for anyone who wants them.
Cliff
>>> "Randy Pearson" < = EMAIL ADDRESS REMOVED = > 1/28/2009 4:12 PM >>>
>> If it's a page that lists many such files, how about a properly
>> marked up table, with the two versions in separate columns?
We started down that path, but then didn't like it. Hence pausing for this
post. ;) What we did not like was the table might already have 4 columns to
include date, size, type, name. If you added the accessible version to the
same row, then really you need 4 similar columns, as date, size, type apply
separately to those files also. That approach felt both structurally and
visually wrong.
>> How about "signed copy" (or "as submitted," if it's something
>> your agency received) and "accessible version"?
Good idea. We're already en route to establishing file naming conventions,
wherein one can (hopefully) grok something from the names (e.g., one might
have an extra "_text" appended to the stem of the name). But your idea
sounds like a good one. Perhaps add a "notes" column to the right that
includes this. In fact that could provide an avenue to point to the other
file also.
-- Randy
From: Rakesh Chowdary Paladugula
Date: Thu, Jan 29 2009 11:15PM
Subject: Re: Semantics for Indicating Accessible Version of Files
← Previous message | No next message
My view on it is , Why can't we do like the google.
If a pdf file is listed in any search results google automatically
provides a html version of the file.
I don't know more about it as I am just a web accessibility tester.
Thanks & regards
--
Rakesh Paladugula
Web accessibility tester & developer
Iridiuminteractive soft ware solutions
Mob:9966346422
Website:http://rakeshpaladugula.blog.co.in