E-mail List Archives
Thread: Word Documents header and footer areas not exported to PDF tag tree
Number of posts in this thread: 12 (In chronological order)
From: Jon Brundage
Date: Thu, Dec 12 2024 9:29AM
Subject: Word Documents header and footer areas not exported to PDF tag tree
No previous message | Next message →
Hello-
I have Word documents used as source content for PDFs. When I convert to PDF
any content in the header/footer areas in Word do not get included in the
tag tree. I've looked for commands to make the headers and footers be
included in the PDF tag tree, but so far no luck.
Anyone know how to ensure that all content in Word documents are ported to
the PDF tag tree?
Thanks,
Jon
From: Hayman, Douglass
Date: Thu, Dec 12 2024 9:41AM
Subject: Re: - Word Documents header and footer areas not exported to PDFtag tree
← Previous message | Next message →
Jon,
From what I've seen/heard, we tend to exclude those in our PDFs when remediating them. We might select just the first instance of something in the header if it provides something meaningful but otherwise missing. But we'd spare the user hearing
"Acme Corporation logo" read as each new page loads while maybe the first one was ok to have. Subsequent announcement of that header content would be a distraction.
Likewise, as I understand the process, we'd exclude the footer info and instead rely upon say, the page numbers of the PDF itself. I've seen PDFs that were a compilation of documents in the case of a public records request. The first document might be numbered 1-5 in the footer, the second one 1-20 and if those were not artificated but announced the user of assistive technology might hear in sequence, "page 4, page 5, page 1, page 2 and it went from the one to the other document now combined in a PDF.
Doug Hayman
IT Accessibility Coordinator
Information Technology
Olympic College
= EMAIL ADDRESS REMOVED =
(360) 475-7632
From: Markus Erle
Date: Thu, Dec 12 2024 9:55AM
Subject: Re: Word Documents header and footer areas not exported to PDFtag tree
← Previous message | Next message →
Hi Jon,
In Word you do not have the choice to mark running headers or footers, so that they are added as relevant content to the PDF tag tree.
You need a special conversion tool for that. For example: axesWord.
https://support.axes4.com/hc/en-us/articles/7371830232082-Headers-and-footers
)Markus
<axes4 logo> Markus Erle
Co-Founder & CEO
Tel.: +49 7071 549 89 24 <tel:+49 7071 549 89 24>
-----Ursprüngliche Nachricht-----
Von: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > Im Auftrag von Jon Brundage via WebAIM-Forum
Gesendet: Donnerstag, 12. Dezember 2024 17:29
An: 'WebAIM Discussion List' < = EMAIL ADDRESS REMOVED = >
Cc: = EMAIL ADDRESS REMOVED =
Betreff: [WebAIM] Word Documents header and footer areas not exported to PDF tag tree
Hello-
I have Word documents used as source content for PDFs. When I convert to PDF any content in the header/footer areas in Word do not get included in the tag tree. I've looked for commands to make the headers and footers be included in the PDF tag tree, but so far no luck.
Anyone know how to ensure that all content in Word documents are ported to the PDF tag tree?
Thanks,
Jon
From: Philip Kiff
Date: Thu, Dec 12 2024 9:58AM
Subject: Re: - Word Documents header and footer areas not exported to PDF tag tree
← Previous message | Next message →
As Doug explains, this is by design.
Content that is significant to a document's intended meaning should not
be included *only* in headers and footers, which are considered
supplementary and repetitive information.
In addition to the reasons Doug explains, including such content in the
regular page stream may also interrupt the flow of text for some
assistive technology, by inserting the footer or header in between a
single paragraph or list that is intended to flow from one page to the
next.
A document's title, date, and other common information placed in
headers/footers should appear elsewhere in the document as standard text
that won't be artifacted when converted to PDF.
If there is significant content in a header or footer, that you want to
include once and only once, then one strategy in Word is to start the
repeated header/footer on the second and then on the first page insert a
duplicate of the header/footer that is not actually placed in the header
or footer area, but is instead manually positioned to appear in the
exact same location. You can manually place such content by modifying
the top/bottom margins for the first page, or by using a text box and
applying absolute positioning. In the latter case, you would want to
"anchor" the text box in the place where you want the content to be read
by assistive technology.
Phil.
Philip Kiff
D4K Communications
On 2024-12-12 11:41 a.m., Hayman, Douglass wrote:
> Jon,
>
> From what I've seen/heard, we tend to exclude those in our PDFs when remediating them. We might select just the first instance of something in the header if it provides something meaningful but otherwise missing. But we'd spare the user hearing
>
> "Acme Corporation logo" read as each new page loads while maybe the first one was ok to have. Subsequent announcement of that header content would be a distraction.
>
> Likewise, as I understand the process, we'd exclude the footer info and instead rely upon say, the page numbers of the PDF itself. I've seen PDFs that were a compilation of documents in the case of a public records request. The first document might be numbered 1-5 in the footer, the second one 1-20 and if those were not artificated but announced the user of assistive technology might hear in sequence, "page 4, page 5, page 1, page 2 and it went from the one to the other document now combined in a PDF.
>
>
>
> Doug Hayman
> IT Accessibility Coordinator
> Information Technology
> Olympic College
> = EMAIL ADDRESS REMOVED =
> (360) 475-7632
>
>
>
From: Karen McCall
Date: Thu, Dec 12 2024 10:21AM
Subject: Re: Word Documents header and footer areas not exported to PDFtag tree
← Previous message | Next message →
Page Header and Footer information from Word has to b manually tagged. It is not tagged automatically.
For those of us using screen readers, having to listen to the repetitive text at the top and bottom of each page is annoying at best and interferes with our ability to comprehend the content at worst. Imagine a paragraph that spans two pages interrupted in the middle by hearing the Page Header and Footer information. People who don't use screen readers aren't forced to read Page Headers and Footers and as a screen reader user, I don't want to be forced to read then either.
I recommend that if information is important for the understanding of the content, that it be put somewhere in the content of the document, especially contact information or information that indicates the content is authorized by the organization. One technique if to put the logo in the Page Header or Footer and the text somewhere on the first page. It doesn't have to be a heading, just an acknowledgement that the document is an official document.
For page numbers, PDFs use Page Labels which solves the problem of interrupting paragraphs, lists or tables with page number information. Page Labels are used by everyone to ensure they can get to the correct page and both JAWS and NVDA have keyboard commands for reading Page Labels.
So, page Header and Footer information are not tagged by design.
Hope this helps.
Cheers, Karen
From: Jon Brundage
Date: Thu, Dec 12 2024 10:29AM
Subject: Re: Word Documents header and footer areas not exported to PDFtag tree
← Previous message | Next message →
Thank you so much, Karen
From: Dana McMullen
Date: Thu, Dec 12 2024 11:53AM
Subject: Re: Word Documents header and footer areas not exported to PDF tag tree
← Previous message | Next message →
Hello all,
Hello everyone, This is my first time participating in this community, and
I'm excited to finally contribute. I wanted to join this conversation about
PDF conversion and share some insights regarding why headers and footers
are often considered "PDF artifacts" and are hidden from assistive
technology. This practice is beneficial for users with disabilities. To
clarify, I assume that when you refer to these elements, you mean that they
are: 1. Not part of the main content (like paragraph headers) 2. Intended
solely for layout purposes. A PDF artifact is defined as an element in a
PDF document that: 1. Is not intended to be part of the main content, 2. Is
typically used for layout purposes, and 3. Is usually ignored by screen
readers and other assistive technologies. Artifacts can include:
- Page Numbers - Often treated as layout elements rather than content.
- Watermarks - Decorative or security features not intended for content
reading.
- Background Graphics - Images or patterns used solely for design
purposes. By skipping these types of elements, we ensure that users with
disabilities receive only relevant content read to them. This significantly
enhances both accessibility and usability.
===
Dana McMullen
Web Developer | Lead Accessibility Specialist | Tester| Consultant
Remote Digital Office Practitioner
On Thu, Dec 12, 2024 at 12:29 PM Jon Brundage via WebAIM-Forum <
= EMAIL ADDRESS REMOVED = > wrote:
> Thank you so much, Karen
>
From: Duff Johnson
Date: Thu, Dec 12 2024 12:00PM
Subject: Re: Word Documents header and footer areas not exported to PDFtag tree
← Previous message | Next message →
I want to add something to what Karen has very rightly said.
> On Dec 12, 2024, at 12:21, Karen McCall < = EMAIL ADDRESS REMOVED = > wrote:
<snip>
> For page numbers, PDFs use Page Labels which solves the problem of interrupting paragraphs, lists or tables with page number information. Page Labels are used by everyone to ensure they can get to the correct page and both JAWS and NVDA have keyboard commands for reading Page Labels.
100% true!
BUT… page labels are mishandled by authors (and their software) FAR TOO OFTEN. So much so that the US Court of Appeals imposed its own requirement for correct page labels on legal submissions(!)
For those interested in learning more about this critical aspect of PDF navigation, please see the PDF Association CTO’s authoritative article on the subject, which offers a ton of guidance on getting this vital (for accessibility) feature right:
https://pdfa.org/pdf-ux-page-labels/
Duff Johnson
PDF Association
pdfa.org
From: David Engebretson Jr.
Date: Thu, Dec 12 2024 2:07PM
Subject: Re: - Word Documents header and footer areas not exported to PDFtag tree
← Previous message | Next message →
Yes, I've seen this issue. It's a bummer, a?
We've worked around it by including the header/footer information in the main section of the document. We just pulled the header/footer information into the main document and all is well. It just doesn't repeat in the document on every page...
Also, when you are saving the document, do you use the File-Save dialog and change the filetype to PDF? That's what I do to avoid the cumbersome Adobe software involved in the Save to PDF plugin offered by Adobe.
Also, be sure your PDF "Options" are set properly then you _should_ be golden.
Cheers,
David
From: Dean.Vasile
Date: Thu, Dec 12 2024 2:52PM
Subject: Re: - Word Documents header and footer areasnot exported to PDFtag tree
← Previous message | Next message →
I do have to say as a JAWS user
I have had two OCR PDF documents that were not formatted correctly.
And then have to hear the header and the footer every single page.
So for me that was definitely a bit of a nuisance
Dean Vasile
IAAP, CPACC
= EMAIL ADDRESS REMOVED =
617-799-1162
> On Dec 12, 2024, at 4:08 PM, David Engebretson Jr. < = EMAIL ADDRESS REMOVED = > wrote:
>
> Yes, I've seen this issue. It's a bummer, a?
>
> We've worked around it by including the header/footer information in the main section of the document. We just pulled the header/footer information into the main document and all is well. It just doesn't repeat in the document on every page...
>
> Also, when you are saving the document, do you use the File-Save dialog and change the filetype to PDF? That's what I do to avoid the cumbersome Adobe software involved in the Save to PDF plugin offered by Adobe.
>
> Also, be sure your PDF "Options" are set properly then you _should_ be golden.
>
> Cheers,
> David
>
From: Ryan E. Benson
Date: Fri, Dec 13 2024 7:22AM
Subject: Re: Word Documents header and footer areas not exported to PDF tag tree
← Previous message | Next message →
Welcome Dana,
Any content can be marked up as an artifact, and it is up to the
content author or the person doing the remediation to make that
decision. For example, in word, to make the top row a header, you have
to select the "repeat header row" option regardless of if the table
spans multiple pages. If the table does span across multiple pages,
marking the repeated text an artifact should be done because it can
create confusion/distraction as Karen said in her comments.
--
Ryan E. Benson
On Thu, Dec 12, 2024 at 7:52 PM Dana McMullen
< = EMAIL ADDRESS REMOVED = > wrote:
>
> Hello all,
>
> Hello everyone, This is my first time participating in this community, and
> I'm excited to finally contribute. I wanted to join this conversation about
> PDF conversion and share some insights regarding why headers and footers
> are often considered "PDF artifacts" and are hidden from assistive
> technology. This practice is beneficial for users with disabilities. To
> clarify, I assume that when you refer to these elements, you mean that they
> are: 1. Not part of the main content (like paragraph headers) 2. Intended
> solely for layout purposes. A PDF artifact is defined as an element in a
> PDF document that: 1. Is not intended to be part of the main content, 2. Is
> typically used for layout purposes, and 3. Is usually ignored by screen
> readers and other assistive technologies. Artifacts can include:
>
> - Page Numbers - Often treated as layout elements rather than content.
> - Watermarks - Decorative or security features not intended for content
> reading.
> - Background Graphics - Images or patterns used solely for design
> purposes. By skipping these types of elements, we ensure that users with
> disabilities receive only relevant content read to them. This significantly
> enhances both accessibility and usability.
>
> ===
> Dana McMullen
> Web Developer | Lead Accessibility Specialist | Tester| Consultant
> Remote Digital Office Practitioner
>
> On Thu, Dec 12, 2024 at 12:29 PM Jon Brundage via WebAIM-Forum <
> = EMAIL ADDRESS REMOVED = > wrote:
>
> > Thank you so much, Karen
> >
From: Dana McMullen
Date: Fri, Dec 13 2024 9:10AM
Subject: Re: Word Documents header and footer areas not exported to PDF tag tree
← Previous message | No next message
Hello Ryan,
To clarify my response, I was responding to what happens during the
conversion process from Word to PDF, and why these header/footer areas
convert from Word to PDF as artifacts instead of content that is available
to assistive technology.
But I am aware that we have the ability to change our mark up once we edit
our PDF's. That didn't seem to be the focus of the original question unless
I'm just not reading it correctly.
Respectfully,
Dana
On Fri, Dec 13, 2024 at 9:32 AM Ryan E. Benson < = EMAIL ADDRESS REMOVED = >
wrote:
> Welcome Dana,
>
> Any content can be marked up as an artifact, and it is up to the
> content author or the person doing the remediation to make that
> decision. For example, in word, to make the top row a header, you have
> to select the "repeat header row" option regardless of if the table
> spans multiple pages. If the table does span across multiple pages,
> marking the repeated text an artifact should be done because it can
> create confusion/distraction as Karen said in her comments.
>
> --
> Ryan E. Benson
>
> On Thu, Dec 12, 2024 at 7:52 PM Dana McMullen
> < = EMAIL ADDRESS REMOVED = > wrote:
> >
> > Hello all,
> >
> > Hello everyone, This is my first time participating in this community,
> and
> > I'm excited to finally contribute. I wanted to join this conversation
> about
> > PDF conversion and share some insights regarding why headers and footers
> > are often considered "PDF artifacts" and are hidden from assistive
> > technology. This practice is beneficial for users with disabilities. To
> > clarify, I assume that when you refer to these elements, you mean that
> they
> > are: 1. Not part of the main content (like paragraph headers) 2. Intended
> > solely for layout purposes. A PDF artifact is defined as an element in a
> > PDF document that: 1. Is not intended to be part of the main content, 2.
> Is
> > typically used for layout purposes, and 3. Is usually ignored by screen
> > readers and other assistive technologies. Artifacts can include:
> >
> > - Page Numbers - Often treated as layout elements rather than content.
> > - Watermarks - Decorative or security features not intended for
> content
> > reading.
> > - Background Graphics - Images or patterns used solely for design
> > purposes. By skipping these types of elements, we ensure that users
> with
> > disabilities receive only relevant content read to them. This
> significantly
> > enhances both accessibility and usability.
> >
> > ===
> > Dana McMullen
> > Web Developer | Lead Accessibility Specialist | Tester| Consultant
> > Remote Digital Office Practitioner
> >
> > On Thu, Dec 12, 2024 at 12:29 PM Jon Brundage via WebAIM-Forum <
> > = EMAIL ADDRESS REMOVED = > wrote:
> >
> > > Thank you so much, Karen
> > >