WebAIM - Web Accessibility In Mind

E-mail List Archives

Re: JWAS and special characters pronunciation

for

From: Jukka K. Korpela
Date: Jan 2, 2014 12:51AM


2014-01-02 2:44, Chagnon | PubCom wrote:

> Birkir wrote: " What all screen readers should uniformly support is
> to announce a character differently when put inside a span, it should
> not take a Block level element to get that done."
>
> Whatever solution is developed by the industry, it needs to also work
> for non-HTML documents, such as MS Word, PowerPoint, and Acrobat
> PDFs.

That would be nice, but strings like "p110α" are a real challenge to
software that tries to pronounce it properly. Knowing the context, you
might think that it is self evident that the letter alpha is to be read
as "alpha", pronounced in by the rules of the enclosing language. But in
general, a string consisting of Latin letters, digits, and Greek letters
might have the last part to be read as Greek text. And then reading
alpha as "ah" is actually natural.

It is difficult to formulate a good algorithm for dealing with strings
like that. And I suppose they are relatively rare. Strings like
"α-Tocopherol" are more common. I wonder if they are handled properly;
at least there is the hyphen, which suggests some kind of (morpheme)
boundary.

> There are different Unicode characters for the Greek letter pi used
> in written material (Unicode 03C0) versus the mathematical symbol pi
> used in formulae (Unicode 03D6), although both appear visually the
> same to the human eye.

No, Unicode 03D6 is GREEK PI SYMBOL "ϖ" which is looks more or less like
small omega. It is used as a technical symbol, but I don't think any
standard assigns any specific meaning to it; it has various meanings in
different fields of physics, and it is not replaceable by the normal
letter pi. The common mathematical constant 3.1416... is denoted by the
normal Greek letter small pi.

> Similar for all sorts of dashes; the hyphen has about 12 variations
> but the normal hyphen is Unicode 2010, the mathematical minus sign
> Unicode 2212, the en-dash is Unicode 2013, and the em-dash is Unicode
> 2014. Each of these glyphs has a different purpose in language and
> technical documents.

And they are widely confused with each other.

> It would help if we and the industry could develop standards for how
> these variations will be voiced and treated by AT. One solution is
> for screen readers to pick up the Unicode name from the character
> when they encounter it.

That would be a wrong move, in general. First, Unicode names have not
been designed for such use. They are symbolic identifiers for
characters. Second, they consist of English or anglicized words and are
quite unsuitable when the text language is not English. Speech synthesis
might need to fall back to saying the Unicode name when there is no
other useful information, but it's really just fallback.

For example, "+" should normally be read as "plus" in English, not as
its Unicode name "plus sign". Speech synthesizers really need tables of
names of (or pronunciations for) special characters. And the names are
something that various language communities should define, and register
somewhere, so that software vendors can pick them up. (Unfortunately,
CLDR, the Common Locale Data Repository, though it provides localized
names for many things, does not address names of characters yet.)

> In a series of technical documents we just completed for a client,
> plus and minus signs peppered the narrative, as in "adults 21+" and
> "a co-efficient of −.125". Our screen reader testers didn't even
> know that the characters were there and so they misread a great deal
> of the information in the documents. AT users shouldn't have to play
> mindreader and figure out that they have to force their technology to
> voice individual characters: instead, we the document creators, need
> a way to signal all technologies to voice the character with its
> Unicode name.

I think you would want "−.125" to be read as "minus point one two five"
rather than "minus sign full stop digit one digit two digit five".

In practice, at least on web pages, we encounter "-.125" much more often
than "−.125". It's difficult to say how the Ascii hyphen, or
"hyphen-minus" to use the Unicode name, should be pronounced in
different contexts. Probably it should be read as "hyphen" when it is
not apparently part of a hyphenated work or a standalone symbol
surrounded by spaces (in which case it should probably be treated as a
punctuation dash).

Yucca