WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: Word To PDF Accessibility: What Does and Does Not Transmit From Word To PDF

for

From: Philip Kiff
Date: Jun 18, 2019 9:41AM


I was just writing up my own list when I received Karen's helpful list.

It looks like there's quite a bit of overlap in items we flagged, but
she's helpfully separated out which items are affected by MS Word vs
Acrobat, which I have not. Am sending this along anyways.

As Karen notes, one challenge is that the list of items needing
additional touch-ups can change with different versions of Word -
including especially different minor updates to Office 365 and Acrobat
Pro DC.

I don't have a proper list of all the items that MS Word fails to
convert properly. These days I use the AxesPDF for Word plugin to
generate much cleaner PDFs from Word in order to save some time.

But here's a quick list of significant ones compiled from some notes
I've made.

Figures
- reading order sometimes incorrect - especially if figures are not inline
- even for inline figures, "content order" doesn't match "tag order"

Tables
- usually need to add scope to headers
- grid lines usually not artifacted, sometimes placed in stray <P> tags
as child of <TR> or <TD>
- repeated header rows not artifacted on subsequent pages
- single table sometimes broken into multiple tables when it spans a
page break

Lists
- bullets and list numbers not placed within <Lbl> tag
- nested lists often break when sub-list spans a page
- individual list item breaks list structure when <LI> spans a page break
- list items containing a paragraph break need to be fixed and re-nested
so <P> is within <LBody>

Table of Contents
- sometimes need to move heading out of ToC structure
- sometimes need to merge multiple links into single links
- sometimes need to artifact or otherwise deal with dot leaders
- sometimes need to fix  nested ToCs

Footnotes
- footnote body <Note> tags need to be reformatted so that they don't
contain <P>'s
- sometimes need to perform heroic reconstruction when footnote bodies
span page breaks
- to meet PDF/UA, you need to add note ID's

Tag Structure
- root tag sometimes needs to be changed to <Document>
- untagged content needs to be artifacted

Metadata
- title should be set to appear in title bar of window
- multiple authors need to be re-separated by removing quotation marks

That's not comprehensive. That's just a quick list based on notes at hand!

Phil.

Philip Kiff
D4K Communications


On 2019-06-18 11:35, Karlen Communications wrote:
> Currently Acrobat, Microsoft, Foxit Phantom for Business and Nuance PowerPDF
> Advanced are creating horrid Tables of Content. I've recently tested all of
> these applications with a sample accessible Word document. If you go by the
> PAC 3 checker, you have to add Alt Text to every TOCI item that has a <Link>
> Tag which means that in a list of links, you no longer have access to the
> page numbers.
>
> If you don't add Alt Text to EVERY link in a TOC, at the present time, all
> applications mentioned above are truncating a TOC so that you hear something
> like "List item, Link, Introduction, dot, dot, dot" and if you use Ctrl +
> Down Arrow or just the Down Arrow you hear "link, dot, dot, dot, page
> number." This is also how a TOC appears in a list of links...in pieces.
>
> If you are using Word 365 subscription desktop applications you can now
> indicate in Word that images are decorative/Artifacts which do transfer to
> Artifacts in a PDF document but are not backward compatible if you send the
> document to someone who does not have the same version of Word that you
> do/earlier version.
>
> Likewise, in recent versions of Word, you can use the Table Tools, Design
> Ribbon to identify Row First Row and Column Headers but that too is not
> backward compatible with other versions of Word and I've found the correct
> tagging of the TH Tag to be something that needs checking.
>
> Footnotes, Endnotes and Cross-references in all of the above mentioned
> applications are a mess once an accessible Word document is converted to
> tagged PDF. All of the reference information gets lumped together under one
> Tag and it is all read at once when you encounter the first Footnote or
> Endnote on a page in a PDF document. In Word, these are accessible using
> keyboard commands to move between Footnotes and Endnotes and the main body
> of the document where you found each Footnote or Endnote. (I'm using the
> JAWS screen reader but there are Ribbon Commands to move back and forth as
> well).
>
> With any images, you will always get a Bounding Box issue if you use PAC 3
> on a PDF from an accessible Word document. While not having much to do with
> the accessibility of the PDF, the only solution, other than using a third
> party remediation tool, is to make the images Artifacts, then Tag them again
> and add the Alt Text...again.
>
> The Title Style, which should have been used for the title of the document
> will need to be made a <H1> Tag although in older versions of Word, this was
> in place and I noticed in the last few weeks it is back again but there were
> a few years when the Title Style was a <P> Tag in a PDF.
>
> If you choose to embed the fonts in the Word document, often Acrobat Pro DC
> will tell you the fonts need to be embedded again. If you correctly use the
> Paragraph dialog to add space before or after text like Headings, you will
> see that something called ArialMT has been added to your document for those
> spaces in the PDF document and it gets flagged by the Acrobat "verify PDF/UA
> conformance" tool. Also note that you can use the "Analyze and fix" tool in
> Acrobat to fix any syntax errors and add the PDF/UA identifier to your PDF
> document but since this tool doesn't check for the unique ID's in table
> cells or Alt Text on images or links, you will still fail PAC 3 but have the
> PDF/UA identifier on your document and I can't find a way to remove it.
>
> Those are some of the things I find. It really depends on the version of
> Word you have, what you need to take a closer look at and what tools you use
> in Acrobat Pro DC.
>
> Oh, within the past year, the language has correctly been identified as
> plain "English" in the Advanced tab of the Document Properties dialog
> instead of localized languages which force the use of a different
> voice/pronunciation of words. So now, and I think it is in both Acrobat Pro
> DC and the Microsoft tagging tool, the language is generic which means we
> can use the synthesized voice we are used to.
>
> Microsoft adds the <Document> Tag at the top of the Tags Tree, Acrobat Pro
> DC still does not. Microsoft has the initial view set to Document Title,
> Acrobat does not.
>
> I've noticed recently/past few weeks that I have to go and check the Tab
> Order if the document has links and/or form controls. For years this was set
> by default to "use document structure" but now it sometimes jogs out of that
> mode and I have to reset it. Not a consistent issue, but don't be surprised
> if you see it.
>
> Just thinking off the top of my head. There are probably things I've missed.
>
> Cheers, Karen
>
>
>
> -----Original Message-----
> From: WebAIM-Forum < <EMAIL REMOVED> > On Behalf Of Jim
> Homme
> Sent: Tuesday, June 18, 2019 10:15 AM
> To: WebAIM Discussion List < <EMAIL REMOVED> >
> Subject: [WebAIM] Word To PDF Accessibility: What Does and Does Not Transmit
> From Word To PDF
>
> Hi,
> I'm starting to keep track of what accessibility fixes in Word need to be
> redone when going to PDF. Does anyone already have notes on this? It looks
> to me as though it matters what process you use to make the PDF after you
> make the Word doc accessible. We have both the Adobe and the Microsoft PDF
> generation facilities available to us here.
>
> Thanks.
>
> Jim
>
>
>
> =========> Jim Homme
> Digital Accessibility
> Bender Consulting Services
> 412-787-8567
> https://www.benderconsult.com/our%20services/hightest-accessible-technology-
> solutions
>
> > > http://webaim.org/discussion/archives
> >
> > > >