E-mail List Archives
Thread: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
Number of posts in this thread: 12 (In chronological order)
From: Jim Homme
Date: Tue, Jun 18 2019 8:15AM
Subject: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
No previous message | Next message →
Hi,
I'm starting to keep track of what accessibility fixes in Word need to be redone when going to PDF. Does anyone already have notes on this? It looks to me as though it matters what process you use to make the PDF after you make the Word doc accessible. We have both the Adobe and the Microsoft PDF generation facilities available to us here.
Thanks.
Jim
==========
Jim Homme
Digital Accessibility
Bender Consulting Services
412-787-8567
https://www.benderconsult.com/our%20services/hightest-accessible-technology-solutions
From: Karlen Communications
Date: Tue, Jun 18 2019 9:35AM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
Currently Acrobat, Microsoft, Foxit Phantom for Business and Nuance PowerPDF
Advanced are creating horrid Tables of Content. I've recently tested all of
these applications with a sample accessible Word document. If you go by the
PAC 3 checker, you have to add Alt Text to every TOCI item that has a <Link>
Tag which means that in a list of links, you no longer have access to the
page numbers.
If you don't add Alt Text to EVERY link in a TOC, at the present time, all
applications mentioned above are truncating a TOC so that you hear something
like "List item, Link, Introduction, dot, dot, dot" and if you use Ctrl +
Down Arrow or just the Down Arrow you hear "link, dot, dot, dot, page
number." This is also how a TOC appears in a list of links...in pieces.
If you are using Word 365 subscription desktop applications you can now
indicate in Word that images are decorative/Artifacts which do transfer to
Artifacts in a PDF document but are not backward compatible if you send the
document to someone who does not have the same version of Word that you
do/earlier version.
Likewise, in recent versions of Word, you can use the Table Tools, Design
Ribbon to identify Row First Row and Column Headers but that too is not
backward compatible with other versions of Word and I've found the correct
tagging of the TH Tag to be something that needs checking.
Footnotes, Endnotes and Cross-references in all of the above mentioned
applications are a mess once an accessible Word document is converted to
tagged PDF. All of the reference information gets lumped together under one
Tag and it is all read at once when you encounter the first Footnote or
Endnote on a page in a PDF document. In Word, these are accessible using
keyboard commands to move between Footnotes and Endnotes and the main body
of the document where you found each Footnote or Endnote. (I'm using the
JAWS screen reader but there are Ribbon Commands to move back and forth as
well).
With any images, you will always get a Bounding Box issue if you use PAC 3
on a PDF from an accessible Word document. While not having much to do with
the accessibility of the PDF, the only solution, other than using a third
party remediation tool, is to make the images Artifacts, then Tag them again
and add the Alt Text...again.
The Title Style, which should have been used for the title of the document
will need to be made a <H1> Tag although in older versions of Word, this was
in place and I noticed in the last few weeks it is back again but there were
a few years when the Title Style was a <P> Tag in a PDF.
If you choose to embed the fonts in the Word document, often Acrobat Pro DC
will tell you the fonts need to be embedded again. If you correctly use the
Paragraph dialog to add space before or after text like Headings, you will
see that something called ArialMT has been added to your document for those
spaces in the PDF document and it gets flagged by the Acrobat "verify PDF/UA
conformance" tool. Also note that you can use the "Analyze and fix" tool in
Acrobat to fix any syntax errors and add the PDF/UA identifier to your PDF
document but since this tool doesn't check for the unique ID's in table
cells or Alt Text on images or links, you will still fail PAC 3 but have the
PDF/UA identifier on your document and I can't find a way to remove it.
Those are some of the things I find. It really depends on the version of
Word you have, what you need to take a closer look at and what tools you use
in Acrobat Pro DC.
Oh, within the past year, the language has correctly been identified as
plain "English" in the Advanced tab of the Document Properties dialog
instead of localized languages which force the use of a different
voice/pronunciation of words. So now, and I think it is in both Acrobat Pro
DC and the Microsoft tagging tool, the language is generic which means we
can use the synthesized voice we are used to.
Microsoft adds the <Document> Tag at the top of the Tags Tree, Acrobat Pro
DC still does not. Microsoft has the initial view set to Document Title,
Acrobat does not.
I've noticed recently/past few weeks that I have to go and check the Tab
Order if the document has links and/or form controls. For years this was set
by default to "use document structure" but now it sometimes jogs out of that
mode and I have to reset it. Not a consistent issue, but don't be surprised
if you see it.
Just thinking off the top of my head. There are probably things I've missed.
Cheers, Karen
From: Philip Kiff
Date: Tue, Jun 18 2019 9:41AM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
I was just writing up my own list when I received Karen's helpful list.
It looks like there's quite a bit of overlap in items we flagged, but
she's helpfully separated out which items are affected by MS Word vs
Acrobat, which I have not. Am sending this along anyways.
As Karen notes, one challenge is that the list of items needing
additional touch-ups can change with different versions of Word -
including especially different minor updates to Office 365 and Acrobat
Pro DC.
I don't have a proper list of all the items that MS Word fails to
convert properly. These days I use the AxesPDF for Word plugin to
generate much cleaner PDFs from Word in order to save some time.
But here's a quick list of significant ones compiled from some notes
I've made.
Figures
- reading order sometimes incorrect - especially if figures are not inline
- even for inline figures, "content order" doesn't match "tag order"
Tables
- usually need to add scope to headers
- grid lines usually not artifacted, sometimes placed in stray <P> tags
as child of <TR> or <TD>
- repeated header rows not artifacted on subsequent pages
- single table sometimes broken into multiple tables when it spans a
page break
Lists
- bullets and list numbers not placed within <Lbl> tag
- nested lists often break when sub-list spans a page
- individual list item breaks list structure when <LI> spans a page break
- list items containing a paragraph break need to be fixed and re-nested
so <P> is within <LBody>
Table of Contents
- sometimes need to move heading out of ToC structure
- sometimes need to merge multiple links into single links
- sometimes need to artifact or otherwise deal with dot leaders
- sometimes need to fix nested ToCs
Footnotes
- footnote body <Note> tags need to be reformatted so that they don't
contain <P>'s
- sometimes need to perform heroic reconstruction when footnote bodies
span page breaks
- to meet PDF/UA, you need to add note ID's
Tag Structure
- root tag sometimes needs to be changed to <Document>
- untagged content needs to be artifacted
Metadata
- title should be set to appear in title bar of window
- multiple authors need to be re-separated by removing quotation marks
That's not comprehensive. That's just a quick list based on notes at hand!
Phil.
Philip Kiff
D4K Communications
On 2019-06-18 11:35, Karlen Communications wrote:
> Currently Acrobat, Microsoft, Foxit Phantom for Business and Nuance PowerPDF
> Advanced are creating horrid Tables of Content. I've recently tested all of
> these applications with a sample accessible Word document. If you go by the
> PAC 3 checker, you have to add Alt Text to every TOCI item that has a <Link>
> Tag which means that in a list of links, you no longer have access to the
> page numbers.
>
> If you don't add Alt Text to EVERY link in a TOC, at the present time, all
> applications mentioned above are truncating a TOC so that you hear something
> like "List item, Link, Introduction, dot, dot, dot" and if you use Ctrl +
> Down Arrow or just the Down Arrow you hear "link, dot, dot, dot, page
> number." This is also how a TOC appears in a list of links...in pieces.
>
> If you are using Word 365 subscription desktop applications you can now
> indicate in Word that images are decorative/Artifacts which do transfer to
> Artifacts in a PDF document but are not backward compatible if you send the
> document to someone who does not have the same version of Word that you
> do/earlier version.
>
> Likewise, in recent versions of Word, you can use the Table Tools, Design
> Ribbon to identify Row First Row and Column Headers but that too is not
> backward compatible with other versions of Word and I've found the correct
> tagging of the TH Tag to be something that needs checking.
>
> Footnotes, Endnotes and Cross-references in all of the above mentioned
> applications are a mess once an accessible Word document is converted to
> tagged PDF. All of the reference information gets lumped together under one
> Tag and it is all read at once when you encounter the first Footnote or
> Endnote on a page in a PDF document. In Word, these are accessible using
> keyboard commands to move between Footnotes and Endnotes and the main body
> of the document where you found each Footnote or Endnote. (I'm using the
> JAWS screen reader but there are Ribbon Commands to move back and forth as
> well).
>
> With any images, you will always get a Bounding Box issue if you use PAC 3
> on a PDF from an accessible Word document. While not having much to do with
> the accessibility of the PDF, the only solution, other than using a third
> party remediation tool, is to make the images Artifacts, then Tag them again
> and add the Alt Text...again.
>
> The Title Style, which should have been used for the title of the document
> will need to be made a <H1> Tag although in older versions of Word, this was
> in place and I noticed in the last few weeks it is back again but there were
> a few years when the Title Style was a <P> Tag in a PDF.
>
> If you choose to embed the fonts in the Word document, often Acrobat Pro DC
> will tell you the fonts need to be embedded again. If you correctly use the
> Paragraph dialog to add space before or after text like Headings, you will
> see that something called ArialMT has been added to your document for those
> spaces in the PDF document and it gets flagged by the Acrobat "verify PDF/UA
> conformance" tool. Also note that you can use the "Analyze and fix" tool in
> Acrobat to fix any syntax errors and add the PDF/UA identifier to your PDF
> document but since this tool doesn't check for the unique ID's in table
> cells or Alt Text on images or links, you will still fail PAC 3 but have the
> PDF/UA identifier on your document and I can't find a way to remove it.
>
> Those are some of the things I find. It really depends on the version of
> Word you have, what you need to take a closer look at and what tools you use
> in Acrobat Pro DC.
>
> Oh, within the past year, the language has correctly been identified as
> plain "English" in the Advanced tab of the Document Properties dialog
> instead of localized languages which force the use of a different
> voice/pronunciation of words. So now, and I think it is in both Acrobat Pro
> DC and the Microsoft tagging tool, the language is generic which means we
> can use the synthesized voice we are used to.
>
> Microsoft adds the <Document> Tag at the top of the Tags Tree, Acrobat Pro
> DC still does not. Microsoft has the initial view set to Document Title,
> Acrobat does not.
>
> I've noticed recently/past few weeks that I have to go and check the Tab
> Order if the document has links and/or form controls. For years this was set
> by default to "use document structure" but now it sometimes jogs out of that
> mode and I have to reset it. Not a consistent issue, but don't be surprised
> if you see it.
>
> Just thinking off the top of my head. There are probably things I've missed.
>
> Cheers, Karen
>
>
>
>
From: chagnon
Date: Tue, Jun 18 2019 12:10PM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
The list is so-o-o long!
We're seeing a regression in accessible PDFs from any source program using
any of the available tools, not just Adobe's PDF Maker plug-in, Microsoft's
built-in converter, Fox It, Nuance Power PDF, or Axes4 PDF's utility.
The biggest problems are with Tables of Content, footnotes, and regular
tables.
Part of this comes from the development of the forthcoming PDF UA-2
standards, which are not yet completed nor published by the ISO. But
companies are beginning to build UA-2 "stuff" into their programs now, in
today's versions. Essentially, they are jumping ahead of the standard and
consequently, no A T knows how to process this stuff and present it to the
human being using the A T.
The main questions this community needs to ask and get answers for:
1. Why are we going backwards? Why are PDFs exported from Word today, with
the latest software releases, less accessible than they were 2 years ago?
2. Why do different companies interpret the standards differently? Aren't
standards supposed to, well, standardize things?
3. Any why are companies programming for UA-2 before it's ready and
released? Or even the law? Sec. 508 requires only PDF UA-1, not UA-2.
Any answers?
-Bevi
- - -
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
- - -
PubCom: Technologists for Accessible Design + Publishing
consulting . training . development . design . sec. 508 services
Upcoming classes at www.PubCom.com/classes
- - -
Latest blog-newsletter - Accessibility Tips at www.PubCom.com/blog
From: Karlen Communications
Date: Tue, Jun 18 2019 12:56PM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
Good questions. Who do we ask...the individual developers or the ISO
committee developing the technical specifications?
If everyone is mis-interpreting the standards it stands to reason that the
standards aren't clear. We should be able to open a PDF document and have a
consistent experience if it has been tagged. Even if it is just tagged,
there are elements that would be consistently accessible before remediation.
As I stated in my comparison of the Tables of Content tagging, a TOC was
tagged so that we could read a TOCI item and be clear as to the topic and
page number in October 2018 from an accessible Word document, now we can't.
If developers are trying to implement a standard not yet published or
adopted by legislation, how can they code for a moving target? And why?
And...I never thought I'd be saying this...if this is the future of PDF
"accessibility" can we afford to spend time making PDF accessible or should
we look at other file formats that are not going backward in accessibility
support? We shouldn't have to pay a large remediation service provider to
make our PDF documents accessible or spend hours ourselves to make PDF
accessible if we start with accessible content to begin with.
Should our focus be on tools that convert PDF to readable and understandable
content bypassing security settings? We may lose the rich layout but at
least we would have access to the content we need when we need it.
Apologies for sounding so frustrated, I've been testing documents in the
various applications for a week now and am ready to just stop reading PDF.
Time for a break! Will be working in Word for the next week or so....
Cheers, Karen
From: Duff Johnson
Date: Tue, Jun 18 2019 7:14PM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
> We're seeing a regression in accessible PDFs from any source program using
> any of the available tools, not just Adobe's PDF Maker plug-in, Microsoft's
> built-in converter, Fox It, Nuance Power PDF, or Axes4 PDF's utility.
> The biggest problems are with Tables of Content, footnotes, and regular
> tables.
Can you describe the problems you are encountering now in detail?
> Part of this comes from the development of the forthcoming PDF UA-2
> standards, which are not yet completed nor published by the ISO. But
> companies are beginning to build UA-2 "stuff" into their programs now, in
> today's versions.
Really? Can you provide examples?
> Essentially, they are jumping ahead of the standard and
> consequently, no A T knows how to process this stuff and present it to the
> human being using the A T.
Most AT hasn't caught up with ISO 32000-1 (2008) yet… so ISO 32000-2 (2017) doesn't really surprise me...
> The main questions this community needs to ask and get answers for:
>
> 1. Why are we going backwards? Why are PDFs exported from Word today, with
> the latest software releases, less accessible than they were 2 years ago?
Examples, please. What exactly are they doing wrong today?
> 2. Why do different companies interpret the standards differently?
It's a human thing. :-)
> Aren't
> standards supposed to, well, standardize things?
Let's first ascertain that this is, in fact, happening...
> 3. Any why are companies programming for UA-2 before it's ready and
> released? Or even the law? Sec. 508 requires only PDF UA-1, not UA-2.
>
> Any answers?
PDF/UA-2 doesn't really exist yet. It's hard for me to believe that anyone has attempted to implement it at this stage.
If they are writing PDF 2.0 tag structures ok… but then there's the rest of the toolchain (accessibility APIs and AT) needed for consumption. It seems unlikely that Adobe (to take one example) would implement features that its own software could not consume… but maybe.
Examples would be good!
Duff.
From: Karlen Communications
Date: Wed, Jun 19 2019 4:27AM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
Duff:
I've started documenting some of the "crappy" tagging we're seeing from all applications:
https://www.karlencommunications.com/DocumentRemediation.htm
I'll send you as many examples as you need. Am currently working on how Footnotes/Endnotes are being lumped together and read as a single unit when you land on the first one in a document. Am seeing if this tagging behaviour is unique to the Microsoft and Acrobat Pro DC tagging tools or if Foxit Phantom for Business and Nuance PowerPDF also produce this type of accessibility barrier.
For the scanned documents, I can send you all the sample files off list as there are about 15 of them and they won't fit in an e-mail, even attached to the PDF document about scanned documents.
Phillip and I have chronicled the problems in response to the original post with this subject line. He did list some of the things I missed so merging the lists will give you a good idea on the problems we are encountering.
Cheers, Karen
From: Duff Johnson
Date: Wed, Jun 19 2019 8:14AM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
I'm not against naming & shaming vendors that do the wrong thing… so long as the problem can be clearly and convincingly demonstrated.
Indeed, this industry (and its users) suffers because the very same users (remediators, customers, AT users) are mostly (with the very honorable exception of Karen and some others) willing to put up with whatever dogfood the software vendors are serving. This is no less true for AT vendors than it is for the big companies.
> I've started documenting some of the "crappy" tagging we're seeing from all applications:
> https://www.karlencommunications.com/DocumentRemediation.htm <https://www.karlencommunications.com/DocumentRemediation.htm>
Once I am done with stuff on my immediate plate I will be happy to look at this.
> I'll send you as many examples as you need. Am currently working on how Footnotes/Endnotes are being lumped together and read as a single unit when you land on the first one in a document.
So it's tagged that way? That is bizarre, and no standard suggests that such would be correct.
> Am seeing if this tagging behaviour is unique to the Microsoft and Acrobat Pro DC tagging tools or if Foxit Phantom for Business and Nuance PowerPDF also produce this type of accessibility barrier.
As always, I will have two questions...
1) Is the content tagged correctly (i.e., per the document's semantics / PDF/UA), and
2) Is the interpreting software getting it wrong (i.e., even if it's correctly tagged)
As you know, I insist on distinguishing between these two questions, as (in my view) it's the only way to determine who needs to do what.
> For the scanned documents, I can send you all the sample files off list as there are about 15 of them and they won't fit in an e-mail, even attached to the PDF document about scanned documents.
Links are fine….
> Phillip and I have chronicled the problems in response to the original post with this subject line. He did list some of the things I missed so merging the lists will give you a good idea on the problems we are encountering.
OK
Duff.
From: chagnon
Date: Wed, Jun 19 2019 11:56AM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
Duff, here's great example of the problems we're seeing on the Adobe open forums.
https://forums.adobe.com/message/11130092
Having been through similar problems in our shop, I'm leaning toward it being software problems with any combination of these programs:
âThe source (authoring) program, such as MS Word or Adobe InDesign.
âThe conversion utility that created the PDF, such as PDF Maker (Acrobat Ribbon) or MS export utility.
âAcrobat DC Pro, which she's using to check and correct the PDF.
âThe screen reader software programs.
It's like looking for a needle buried in a haystack.
âBevi
â â â
Bevi Chagnon, founder/CEO | = EMAIL ADDRESS REMOVED =
â â â
PubCom: Technologists for Accessible Design + Publishing
consulting ' training ' development ' design ' sec. 508 services
Upcoming classes at www.PubCom.com/classes
â â â
Latest blog-newsletter â Accessibility Tips at www.PubCom.com/blog
From: David Engebretson Jr.
Date: Wed, Jun 19 2019 1:36PM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
Yah, it takes a knowledge of parametric equations to find the needle sometimes.
From: Sarah Ferguson
Date: Thu, Jun 20 2019 12:22PM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | Next message →
I have also noticed an issue with headings in Word not showing up
consistently in PDFs like they used to. It has to do with versions, because
if I use Word on my PC, it works every time. If I use my Macbook and use
the Adobe tab in Word instead of the the save as PDF route, it works. On
our Mac desktops, it doesn't work. Every computer is using Office 2019
professional, but they have slight differences for platform or 32/64 bit.
I've complained about it for maybe 2 years now.
Sarah Ferguson
Web Accessibility Specialist
On Wed, Jun 19, 2019 at 3:36 PM David Engebretson Jr. <
= EMAIL ADDRESS REMOVED = > wrote:
> Yah, it takes a knowledge of parametric equations to find the needle
> sometimes.
>
>
From: Duff Johnson
Date: Sun, Jun 30 2019 2:30PM
Subject: Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF
← Previous message | No next message
> Duff, here's great example of the problems we're seeing on the Adobe open forums.
> https://forums.adobe.com/message/11130092
I finally got to reading this. What a mess.
> Having been through similar problems in our shop, I'm leaning toward it being software problems with any combination of these programs:
>
> âThe source (authoring) program, such as MS Word or Adobe InDesign.
>
> âThe conversion utility that created the PDF, such as PDF Maker (Acrobat Ribbon) or MS export utility.
>
> âAcrobat DC Pro, which she's using to check and correct the PDF.
>
> âThe screen reader software programs.
>
> It's like looking for a needle buried in a haystack.
There are several issues here.
- Authoring software (in the 2nd example) is making PDFs without word-spacing (not mentioned in the writeup, which makes me wonder how they are testing, since they didn't mention this problem).
- The author misunderstands the structure elements they are using… for example, they are wrapping a sentence of text in an <Lbl> element, which is just dead wrong regardless of their claim that the file conforms to PDF/UA. This isn't the right way to use a <Lbl> element.
- But it's also true that the reader implementations (accessibility API and/or AT) are letting them down as well… that "link" shouldn't be there, for one thing.
Philip Kiff's comment is accurate (so far as I can tell).
Duff.