WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Screen reader reading words as run-on

for

Number of posts in this thread: 13 (In chronological order)

From: Alan Zaitchik
Date: Mon, May 02 2022 4:51PM
Subject: Screen reader reading words as run-on
No previous message | Next message →

Listening to a pdf document using nvda (and then jaws) i hear certain words as “run on”, e.g. the words “in each” are pronounced as if they were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
Any suggestions?
Thanks,
Alan

From: Vaibhav Saraf
Date: Mon, May 02 2022 5:26PM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

Hi Alan,

I have observed this quite often as a user particularly with PDFs designed
in Indesign. Technically this isn't a fault that can be caught by the
automated checkers to what I know.

If you are asking from a remediation point then you would need to use the
"actual text" property to make this work. I remember telling this to one of
my contacts, it was hard to find for them, but ultimately it worked out.

Thanks,
Vaibhav



On Mon, 2 May 2022 at 18:51, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:

> Listening to a pdf document using nvda (and then jaws) i hear certain
> words as “run on”, e.g. the words “in each” are pronounced as if they were
> one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok
> but runs on other words.) Looking at the content panel in Acrobat it seems
> that the words are discrete with white space between them. Neither Acrobat
> nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan
> > > > >

From: Steve Green
Date: Mon, May 02 2022 5:40PM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

I had this on a document last week. I found that if I used Acrobat's "Edit text & images" feature to make any change in a text frame, such as adding one character and deleting it, screen readers then read all the text in that frame properly.

Unfortunately, editing a text frame means you have to re-tag it, but that only takes a few minutes and it's a cleaner solution than using Actual Text.

Steve Green
Managing Director
Test Partners Ltd


-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Vaibhav Saraf
Sent: 03 May 2022 00:27
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Screen reader reading words as run-on

Hi Alan,

I have observed this quite often as a user particularly with PDFs designed in Indesign. Technically this isn't a fault that can be caught by the automated checkers to what I know.

If you are asking from a remediation point then you would need to use the "actual text" property to make this work. I remember telling this to one of my contacts, it was hard to find for them, but ultimately it worked out.

Thanks,
Vaibhav



On Mon, 2 May 2022 at 18:51, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:

> Listening to a pdf document using nvda (and then jaws) i hear certain
> words as “run on”, e.g. the words “in each” are pronounced as if they
> were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this
> example ok but runs on other words.) Looking at the content panel in
> Acrobat it seems that the words are discrete with white space between
> them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan
> > > archives at http://webaim.org/discussion/archives
> >

From: Duff Johnson
Date: Mon, May 02 2022 5:51PM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

Can the file be provided for inspection?

Duff.

> On May 2, 2022, at 18:51, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:
>
> Listening to a pdf document using nvda (and then jaws) i hear certain words as “run on”, e.g. the words “in each” are pronounced as if they were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan
> > > >

From: chagnon@pubcom.com
Date: Tue, May 03 2022 12:18AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

We've seen (or heard) this mispronunciation and it's usually caused by one of the following:

— The content creator used a "manual line break" (aka, Shift + Enter) to force text to wrap to the next line without creating a new paragraph or <P> tag. Graphic designers do this often in desktop publishing programs like InDesign. The solutions: avoid forced line breaks within paragraphs by using other methods to wrap the text, or add a spacebar before the line break. They're both hidden characters so designers often don't see this problem in their layouts.

— The content author used an unusual spacebar, such as a non-breaking space (Unicode 00A0), figure space, hair space, thin space, quarter space, third space, punctuation space, or flush space. InDesign is a professional grade typesetting program as well as a design and layout program, so it has many more types of typesetting spaces than other programs. Sometimes these are not translated as normal "spaces" (Unicode 0020) when the PDF is exported or correctly interpreted by the assistive technology. This is a problem that must be addressed by all the players in the accessibility industry.

— For some reason, some OCR software skips the spaces when a scanned document is OCR'd. Very common with Adobe Acrobat's built in OCR utility, but given that this was from Adobe InDesign, there should be no need to OCR anything. Well, unless the designer exported a Press / Print PDF rather than an accessible tagged PDF. In that situation, the remediator might have to run an OCR on the content to make the text live so it can be tagged.

— Sometimes the A T just doesn't acknowledge the space is there. We have no idea why. JAWS and NVDA should process them correctly.

—Bevi

— — —
Bevi Chagnon | Designer, Accessibility Technician | = EMAIL ADDRESS REMOVED =
— — —
PubCom: Technologists for Accessible Design + Publishing
consulting ' training ' development ' design ' sec. 508 services
Upcoming classes at www.PubCom.com/classes
— — —
Latest blog-newsletter – Simple Guide to Writing Alt-Text

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Alan Zaitchik
Sent: Monday, May 2, 2022 6:51 PM
To: = EMAIL ADDRESS REMOVED =
Subject: [WebAIM] Screen reader reading words as run-on

Listening to a pdf document using nvda (and then jaws) i hear certain words as “run on”, e.g. the words “in each” are pronounced as if they were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
Any suggestions?
Thanks,
Alan

From: Karen McCall
Date: Tue, May 03 2022 3:48AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

One serious note about too much Actual Text attribute!

If you are using a Text-to-Speech tool, such as Read&Write, you will need to switch from the regular reading in PDF to using "Screenshot" reader. Also, for both screen readers and some Text-to-Speech tools with the ability to highlight as you read, you lose the ability to have the text highlighted. This is an accessibility barrier for those that combine Text-to-Speech or screen reading with screen magnification.

The Actual Text attribute does not let you add semantic structure such as headings, lists, tables and so forth.

The better solution is to run the PDF through an OCR program like ABBYY Fine Reader or OmniPage Pro and fix the spacing. Most of the time, with the newer versions of Fine Reader, just opening the PDF in Fine Reader and letting it automatically "figure it out" removes this problem and you can now tag a PDF from Fine Reader although you still have to do some touch up with the result, people can at least read the PDF without using an attribute for most of the text.

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Vaibhav Saraf
Sent: Monday, May 2, 2022 7:27 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Screen reader reading words as run-on

Hi Alan,

I have observed this quite often as a user particularly with PDFs designed in Indesign. Technically this isn't a fault that can be caught by the automated checkers to what I know.

If you are asking from a remediation point then you would need to use the "actual text" property to make this work. I remember telling this to one of my contacts, it was hard to find for them, but ultimately it worked out.

Thanks,
Vaibhav



On Mon, 2 May 2022 at 18:51, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:

> Listening to a pdf document using nvda (and then jaws) i hear certain
> words as "run on", e.g. the words "in each" are pronounced as if they
> were one word "ineach", pronounced as "in-e-ack". (Jaws handles this
> example ok but runs on other words.) Looking at the content panel in
> Acrobat it seems that the words are discrete with white space between
> them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan
> > > https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flist.
> webaim.org%2F&amp;data%7C01%7C%7Caf029a327c76463f4eb208da2c934093%7
> C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637871308195780017%7CUnkno
> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> CJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=1yTaUWZiVxii25ZQqMUpqK6sJTgZwO
> NogqvVfn%2F%2BjII%3D&amp;reserved=0
> List archives at
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebai
> m.org%2Fdiscussion%2Farchives&amp;data%7C01%7C%7Caf029a327c76463f4e
> b208da2c934093%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C6378713081
> 95780017%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=IQjhVdr1PRS82O
> SliCuJT3qPDfnXa60BtvPGW7fTH%2FM%3D&amp;reserved=0
> >

From: Karen McCall
Date: Tue, May 03 2022 3:50AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

I also see this when using Adobe on-board OCR tool, yet when I simply open the document in Fine Reader, save it as a searchable PDF and try reading it again, it reads as it should. I avoid using the Actual Text attribute as much as I can due to the accessibility barriers.

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Monday, May 2, 2022 7:51 PM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Screen reader reading words as run-on

Can the file be provided for inspection?

Duff.

> On May 2, 2022, at 18:51, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:
>
> Listening to a pdf document using nvda (and then jaws) i hear certain words as “run on”, e.g. the words “in each” are pronounced as if they were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan
> > > https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flist.
> webaim.org%2F&amp;data%7C01%7C%7C3a64d3138a1a405a0c2608da2c96a696%7
> C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637871322818494313%7CUnkno
> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> CJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdataÛO7mWaJ6IEs9S398z8Ou6BIPX8b0n
> JCzXetrBVrONw%3D&amp;reserved=0 List archives at
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebai
> m.org%2Fdiscussion%2Farchives&amp;data%7C01%7C%7C3a64d3138a1a405a0c
> 2608da2c96a696%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C6378713228
> 18494313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=X3sUD0g4grZ2Pa
> tvPA5yi5jthTs1FBEOBtJ%2FCwLT3%2F8%3D&amp;reserved=0
>

From: Duff Johnson
Date: Tue, May 03 2022 4:02AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

> On May 3, 2022, at 05:49, Karen McCall < = EMAIL ADDRESS REMOVED = > wrote:
>
> The Actual Text attribute does not let you add semantic structure such as headings, lists, tables and so forth.

This is not accurate. Any graphics object on a PDF page may have ActualText assigned irrespective of semantic tagging,

ActualText is typically applied via a <Span> element, which itself may be contained in headings, lists, etc. ActualText can also be assigned to marked content.

Duff.

From: Karen McCall
Date: Tue, May 03 2022 6:23AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

All might be true, but what I see is an entire scanned page with Actual Text. And this is coming from remediation services that should know better.

It is true about the highlight not following text in either an alt attribute or an actual text attribute. Best path is to use an OCR tool to make sure scanned stuff or documents with no spaces between words or spaces between every character in a word can be remediated to be accessible/usable.

Cheers, Karen

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Duff Johnson
Sent: Tuesday, May 3, 2022 6:03 AM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: Re: [WebAIM] Screen reader reading words as run-on

> On May 3, 2022, at 05:49, Karen McCall < = EMAIL ADDRESS REMOVED = > wrote:
>
> The Actual Text attribute does not let you add semantic structure such as headings, lists, tables and so forth.

This is not accurate. Any graphics object on a PDF page may have ActualText assigned irrespective of semantic tagging,

ActualText is typically applied via a <Span> element, which itself may be contained in headings, lists, etc. ActualText can also be assigned to marked content.

Duff.

From: Andrews, David B (DEED)
Date: Tue, May 03 2022 7:18AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

This problem has been around for years! And the cause isn't clear. Visually, everything seems ok. If you review the screen with the screen reader on, the words seem run together. Often if you change the reading order, this clears it up! Usually the default order is top to bottom, left to right, and you can generally change it to infer reading order from document, with no ill effect.

Dave



-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Alan Zaitchik
Sent: Monday, May 2, 2022 5:51 PM
To: = EMAIL ADDRESS REMOVED =
Subject: [WebAIM] Screen reader reading words as run-on

This message may be from an external email source.
Do not select links or open attachments unless verified. Report all suspicious emails to Minnesota IT Services Security Operations Center.

Listening to a pdf document using nvda (and then jaws) i hear certain words as "run on", e.g. the words "in each" are pronounced as if they were one word "ineach", pronounced as "in-e-ack". (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
Any suggestions?
Thanks,
Alan

From: Duff Johnson
Date: Tue, May 03 2022 7:20AM
Subject: Re: Screen reader reading words as run-on
← Previous message | Next message →

> All might be true, but what I see is an entire scanned page with Actual Text. And this is coming from remediation services that should know better.

Bad tagging is bad tagging… :-(

But let's not toss the ActualText baby with the bad tagging bathwater… :-)

Duff.

From: Alan Zaitchik
Date: Fri, May 06 2022 6:33AM
Subject: Re: More info about Screen reader reading words as run-on
← Previous message | Next message →

More data about this:
In PAC3's Logical Structure display I see the words separated by spaces. In its Screen Reader Preview I see the words joined together without spaces.
Retesting with both JAWS and NVDA I hear no problems with the read out, even after slowing down the rate of speech to the slowest possible, for many instances of the putative problem— but I have no reason to believe this is always the case. Other users have reported the problem with NVDA, and I myself heard it last week with both NVDA and JAWS— although I cannot find the exact location in the document to retest. Note, too, that the document is created by a third party vendor using PDFLib and Java, not through manual editing. We're trying to Gert them to look at the issue, if course, but I would like to have more information for them.
THUS: Is there uneasy way to see the encoding of specific white space characters in the document?
Thank you,
Alan
By the way, all the above testing was with the latest or very recent releases of NVDA and JAWS (2021, 2022) and Acrobat DC.

> On May 2, 2022, at 6:51 PM, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:
>
> Listening to a pdf document using nvda (and then jaws) i hear certain words as “run on”, e.g. the words “in each” are pronounced as if they were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan

From: David Farough
Date: Fri, May 06 2022 6:49AM
Subject: Re: More info about Screen reader reading words as run-on
← Previous message | No next message

This is interesting!
Perhaps some of the variability in results with Jaws and NVDA could be explained by the speech synthesizer in use.
So many different possibilities here.

-----Original Message-----
From: WebAIM-Forum < = EMAIL ADDRESS REMOVED = > On Behalf Of Alan Zaitchik
Sent: Friday, May 6, 2022 8:34 AM
To: WebAIM Discussion List < = EMAIL ADDRESS REMOVED = >
Subject: [WebAIM] More info about Screen reader reading words as run-on

More data about this:
In PAC3's Logical Structure display I see the words separated by spaces. In its Screen Reader Preview I see the words joined together without spaces.
Retesting with both JAWS and NVDA I hear no problems with the read out, even after slowing down the rate of speech to the slowest possible, for many instances of the putative problem— but I have no reason to believe this is always the case. Other users have reported the problem with NVDA, and I myself heard it last week with both NVDA and JAWS— although I cannot find the exact location in the document to retest. Note, too, that the document is created by a third party vendor using PDFLib and Java, not through manual editing. We're trying to Gert them to look at the issue, if course, but I would like to have more information for them.
THUS: Is there uneasy way to see the encoding of specific white space characters in the document?
Thank you,
Alan
By the way, all the above testing was with the latest or very recent releases of NVDA and JAWS (2021, 2022) and Acrobat DC.

> On May 2, 2022, at 6:51 PM, Alan Zaitchik < = EMAIL ADDRESS REMOVED = > wrote:
>
> Listening to a pdf document using nvda (and then jaws) i hear certain words as “run on”, e.g. the words “in each” are pronounced as if they were one word “ineach”, pronounced as “in-e-ack”. (Jaws handles this example ok but runs on other words.) Looking at the content panel in Acrobat it seems that the words are discrete with white space between them. Neither Acrobat nor PAC3 complain about a missing unicode mapping or anything else.
> Any suggestions?
> Thanks,
> Alan