WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: (PDF) missing white space between words

for

Number of posts in this thread: 4 (In chronological order)

From: Birkir R. Gunnarsson
Date: Mon, Apr 27 2020 3:21PM
Subject: (PDF) missing white space between words
No previous message | Next message →

Gang

One problem I keep having with PDF files that have been remediated by
various sources is the merging of two words into one.
This seems to happen fairly randomly, at least non-visually.
Why this behavior?
Could it be due to reading order tagging of text presented in a column
layout where there is no space between the last word in one column and
the first word in the next?
Any words from the wise (at least wiser than I, which isn't hard)
would be helpful.



--
Work hard. Have fun. Make history.

From: Dona Patrick
Date: Mon, Apr 27 2020 3:28PM
Subject: Re: (PDF) missing white space between words
← Previous message | Next message →

I see that often and it's really frustrating. One thing that I know causes
it is when the author has used a soft-return and not put a space after the
last word in a line.

Otherwise I am stumped as well.

Dona

On Mon, Apr 27, 2020, 5:21 PM Birkir R. Gunnarsson <
= EMAIL ADDRESS REMOVED = > wrote:

> Gang
>
> One problem I keep having with PDF files that have been remediated by
> various sources is the merging of two words into one.
> This seems to happen fairly randomly, at least non-visually.
> Why this behavior?
> Could it be due to reading order tagging of text presented in a column
> layout where there is no space between the last word in one column and
> the first word in the next?
> Any words from the wise (at least wiser than I, which isn't hard)
> would be helpful.
>
>
>
> --
> Work hard. Have fun. Make history.
> > > > >

From: Philip Kiff
Date: Mon, Apr 27 2020 3:47PM
Subject: Re: (PDF) missing white space between words
← Previous message | Next message →

I'm not sure of all the possible reasons why this happens, but I have
run into this numerous times and I have a couple theories.

Generally speaking, I notice this happening almost exclusively in files
that were not created with the "tagged PDF" feature of whatever software
generated the PDF.

It can definitely happen for example with current versions of Adobe
Illustrator and older versions of InDesign: text will often be generated
in separate chunks with one unit per line (these are not placed in
"containers"). When you attempt to tag a series of such text chunks
manually, the individual chunks may not have a blank space at the end of
the line, so your paragraph then includes various words merged together.

I have also seen space issues appear in PDFs after you use the
"auto-tag" feature in Adobe Acrobat Pro. In some files, using this
feature results in all the spaces on a page being collected together in
a single artifact tag at the end of a physical page. It's unclear to me
why this happens, but I think again it is related to weird ways that
some source software generates PDF structures. Most of the time, these
autotagged files have spaces between words that appear just fine despite
an entire other set of "artifacted" spaces. I've wondered if this is
some weird formatting side-effect of some software when text has custom
line-spacing or kerning between text, but I have no idea really.

Shadow effects and outlined text can also sometimes produce weird
effects with duplicate text. If not careful when trying to artifact the
duplicate text snippets (usually line-by-line), you can end up with text
missing spaces (usually only one of the duplicate text pieces actually
has spaces between the words, so you have to select the right one).

Also, note that some PDF remediation software has features that allow
you to insert spaces back in between words in documents that are missing
them.

Phil.

Philip Kiff
D4K Communications

On 2020-04-27 17:21, Birkir R. Gunnarsson wrote:
> Gang
>
> One problem I keep having with PDF files that have been remediated by
> various sources is the merging of two words into one.
> This seems to happen fairly randomly, at least non-visually.
> Why this behavior?
> Could it be due to reading order tagging of text presented in a column
> layout where there is no space between the last word in one column and
> the first word in the next?
> Any words from the wise (at least wiser than I, which isn't hard)
> would be helpful.
>
>
>

From: Andrews, David B (DEED)
Date: Thu, Apr 30 2020 2:22PM
Subject: Re: (PDF) missing white space between words
← Previous message | No next message

I also do not know the cause. I do not see it as often as I used to, a few years back, it was quite common. As I recall, you can sometimes get it to correct itself by changing the reading order.

Dave



-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Dona Patrick
Sent: Monday, April 27, 2020 4:28 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] (PDF) missing white space between words

This message may be from an external email source.
Do not select links or open attachments unless verified. Report all suspicious emails to Minnesota IT Services Security Operations Center.


I see that often and it's really frustrating. One thing that I know causes it is when the author has used a soft-return and not put a space after the last word in a line.

Otherwise I am stumped as well.

Dona

On Mon, Apr 27, 2020, 5:21 PM Birkir R. Gunnarsson < = EMAIL ADDRESS REMOVED = > wrote:

> Gang
>
> One problem I keep having with PDF files that have been remediated by
> various sources is the merging of two words into one.
> This seems to happen fairly randomly, at least non-visually.
> Why this behavior?
> Could it be due to reading order tagging of text presented in a column
> layout where there is no space between the last word in one column and
> the first word in the next?
> Any words from the wise (at least wiser than I, which isn't hard)
> would be helpful.
>
>
>
> --
> Work hard. Have fun. Make history.
>