WebAIM - Web Accessibility In Mind

E-mail List Archives

Thread: Lang attribute and "old" latin

for

Number of posts in this thread: 9 (In chronological order)

From: John Foliot - Stanford Online Accessibility Program
Date: Thu, Apr 24 2008 6:10PM
Subject: Lang attribute and "old" latin
No previous message | Next message →

All,

As far as I know, current screen reading technology only supports a limited
number of languages.

I am in the process of reviewing a number of web documents that feature, in
part, a fair bit of "old Latin" (circa 13th century - it's a cool academic
project). At any rate, W3C guidance states "Clearly identify changes in the
natural language of a document's text and any text equivalents (e.g.,
captions)." *AND* the ISO code for Latin is either "LA" (ISO 639-1) or "LAT"
(ISO 639-2) so clearly this *CAN* be done.

As well, wikipedia suggests that "Screen readers without Unicode support
will read a character outside Latin-1 as a question mark, and even in the
latest version of JAWS, the most popular screen reader, Unicode characters
are very difficult to read." (Is this true, I was not aware of this. The
document often uses þ throughout this old Latin text - is this going
to be an issue?)

The question is, is there any real advantage gained by adding this
information (lang="lat") to the content? It is/would be a huge undertaking,
and if *not* done is pedantically/dogmatically wrong (fails WCAG P1 4.1),
however I am at a loss to explain any real value in doing it to the client
as at the end of the day I cannot myself find a "real justification" that
would improve the accessibility of the document.

Thoughts, arguments (either side) and other support gratefully accepted.

Cheers!

JF

From: Aaron Cannon
Date: Thu, Apr 24 2008 7:20PM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

Just a small correction. Jaws will read many Unicode characters. However, it requires a somewhat advanced installation first. There are instructions on the Freedom Scientific web site.

Aaron

>>> "John Foliot - Stanford Online Accessibility Program" < = EMAIL ADDRESS REMOVED = > 4/24/2008 7:04 PM >>>

All,

As far as I know, current screen reading technology only supports a limited
number of languages.

I am in the process of reviewing a number of web documents that feature, in
part, a fair bit of "old Latin" (circa 13th century - it's a cool academic
project). At any rate, W3C guidance states "Clearly identify changes in the
natural language of a document's text and any text equivalents (e.g.,
captions)." *AND* the ISO code for Latin is either "LA" (ISO 639-1) or "LAT"
(ISO 639-2) so clearly this *CAN* be done.

As well, wikipedia suggests that "Screen readers without Unicode support
will read a character outside Latin-1 as a question mark, and even in the
latest version of JAWS, the most popular screen reader, Unicode characters
are very difficult to read." (Is this true, I was not aware of this. The
document often uses &thorn; throughout this old Latin text - is this going
to be an issue?)

The question is, is there any real advantage gained by adding this
information (lang="lat") to the content? It is/would be a huge undertaking,
and if *not* done is pedantically/dogmatically wrong (fails WCAG P1 4.1),
however I am at a loss to explain any real value in doing it to the client
as at the end of the day I cannot myself find a "real justification" that
would improve the accessibility of the document.

Thoughts, arguments (either side) and other support gratefully accepted.

Cheers!

JF

From: Patrick Burke
Date: Thu, Apr 24 2008 7:40PM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

Hi John,

Some comments added below.
At 05:04 PM 4/24/2008, John Foliot - Stanford Online Accessibility Program
wrote:
>All,
>
>As far as I know, current screen reading technology only supports a limited
>number of languages.
>
>I am in the process of reviewing a number of web documents that feature, in
>part, a fair bit of "old Latin" (circa 13th century - it's a cool academic
>project). At any rate, W3C guidance states "Clearly identify changes in the
>natural language of a document's text and any text equivalents (e.g.,
>captions)." *AND* the ISO code for Latin is either "LA" (ISO 639-1) or "LAT"
>(ISO 639-2) so clearly this *CAN* be done.
>
>As well, wikipedia suggests that "Screen readers without Unicode support
>will read a character outside Latin-1 as a question mark,

I have certainly encountered this, even with Jaws 9.


>and even in the
>latest version of JAWS, the most popular screen reader, Unicode characters
>are very difficult to read." (Is this true, I was not aware of this.

I think it depends whether the speech synthesizer has a mapping for the
sound of the symbol in question. Some Hebrew/Arabic/Farsi letters will be
spoken individually (as the letter name), while some aren't spoken at all.

Jaws officially added Unicode support in V7. At that point several
languages became readable with a Braille display (Russian & Greek), though
speech output doesn't do much with them. (Some symbols still don't come
through in the braille rendering.)

> The document often uses &thorn; throughout this old Latin text - is
> this going
>to be an issue?)
Thorn is understood & spoken even by Jaws 4.51. However, just the letter
name is spoken (&thorn;ing = "thorning", not "thing").

>The question is, is there any real advantage gained by adding this
>information (lang="lat") to the content? It is/would be a huge undertaking,
>and if *not* done is pedantically/dogmatically wrong (fails WCAG P1 4.1),
>however I am at a loss to explain any real value in doing it to the client
>as at the end of the day I cannot myself find a "real justification" that
>would improve the accessibility of the document.

If nothing else, it would help *greatly* if a braille translation had to be
done. If the text switches back and forth from Latin to (Modern) English,
it would be a huge timesaver to search-&-replace for the language code
changes. (The braille translation could be done all in Grade I, but then
the English would seem clunky to proficient braille readers, imho.)


Just my 2 denarii,

Patrick


>Thoughts, arguments (either side) and other support gratefully accepted.
>
>Cheers!
>
>JF

From: Jukka K. Korpela
Date: Fri, Apr 25 2008 12:20AM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

John Foliot - Stanford Online Accessibility Program wrote:

> As far as I know, current screen reading technology only supports a
> limited number of languages.

Rather limited, I'm afraid. Moreover, support to language switching on
the basis of language markup (lang or xml:lang attributes) is much more
limited.

In practical terms, using language markup at the top level (<html> or
<body> element) is a good move: it takes a very small effort, and it
helps some people. (But then it should be _correct_. It often isn't, so
e.g. Google does not use the information.)

Using language markup at other markup levels, e.g. for individual
paragraphs or even words, is rather pointless, sad to say. There isn't
much support worth mentioning. (I use it, but mostly as a matter of
principle, or habit, and not very consistently. Many W3C pages,
including pages that declare that it should be used, don't use it. Most
web pages don't even make a try, so what motivation is there for
software developers to support it?)

That's the big picture. In details, there's a lot that could be said,
especially about the problems, but this doesn't seem to be an
interesting topic to most people. However, mostly for "academic"
interest, I'll comment on your specific issues:

> I am in the process of reviewing a number of web documents that
> feature, in part, a fair bit of "old Latin" (circa 13th century -
> it's a cool academic project).

I took "old" Latin as referring to pre-classic Latin... Anyway, there's
no useful standardized way to distinguish between different forms of
Latin in language codes. You could use country codes, e.g. "la-GB" to
refer to Latin as used in the United Kingdom, but this would be
anachronistic for 13th century language and also useless.

> At any rate, W3C guidance states
> "Clearly identify changes in the natural language of a document's
> text and any text equivalents (e.g., captions)."

I'm afraid nobody, including the W3C, takes that seriously. It's just
too much trouble with little if any tangible benefit. It's based on
theoretical ideas - largely, law, poorly analyzed ideas - on the
_possible_ usefuless of language markup, rather than actual experience.

> *AND* the ISO code
> for Latin is either "LA" (ISO 639-1) or "LAT" (ISO 639-2) so clearly
> this *CAN* be done.

The technically correct language code for use in markup is "la", with
lowercase as the recommended spelling. HTML and XML specifications refer
to specifications that mandate the use of two-letter codes for languages
that have one.

> As well, wikipedia suggests that "Screen readers without Unicode
> support will read a character outside Latin-1 as a question mark,

Character support is a different issue and should not depend on language
markup, and mostly doesn't.

Generally, in special software like screen readers or specialized
browsers, we should expect character support to be more restricted than
in common modern browsers. Even Latin-1 isn't as safe as in "normal"
browsing. For example, what would a screen reader do upon encountering a
special character like " ¶"? Would it recognize it as having a special
meaning (paragraph separator) and make a pause? Hardly. It probably
spells it out. This might mean saying "pilcrow sign", perhaps
independently of language being used (since characters names aren't
widely localized - most characters don't even _have_ a name in most
languages), which might be complete gibberish even to people who
understand normal English.

> The question is, is there any real advantage gained by adding this
> information (lang="lat") to the content?

Very little if at all. But if used, it should be lang="la".

> I am at a loss to explain any real value
> in doing it to the client as at the end of the day I cannot myself
> find a "real justification" that would improve the accessibility of
> the document.

The best explanation that I could use (if someone offered to pay me for
adding such markup and I needed to soup up "internal" and "moral"
motivation) is the following (and it's lame, so this tells a lot):

If a user opens your HTML page in a word processor like Microsoft Word,
it will use the language markup, and this can be relevant when spelling
checks are "on", i.e. words classified as misspelled are highlighted.
Declaring Latin words as Latin prevents the program from applying
English spelling rules to them. (The copy of Word I just tested seems to
be Latin-ignorant. That is, it recognizes the words being in Latin but
does not flag anything as misspelled and does not even hyphenate Latin
words. But even this is probably better than treating them as English or
some other language.)

On some browsers, like Firefox, the user can right-click on a word and
get information about its language. Sometimes it is useful to know that
a word is Latin. (But what are the odds that a user knows about such
functionality?)

Style sheets, either page or user style sheets, could be used to style
words in a particular language as different from others, using a
selector like [lang="la"] or :lang(la). However, this does not work e.g.
on IE 6, which does not recognize such selectors.

Moreover, some day some browsers or other software could make real use
of the markup.

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

From: Jukka K. Korpela
Date: Fri, Apr 25 2008 12:30AM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

John Foliot - Stanford Online Accessibility Program wrote:

> As far as I know, current screen reading technology only supports a
> limited number of languages.

Rather limited, I'm afraid. Moreover, support to language switching on
the basis of language markup (lang or xml:lang attributes) is much more
limited.

In practical terms, using language markup at the top level (<html> or
<body> element) is a good move: it takes a very small effort, and it
helps some people. (But then it should be _correct_. It often isn't, so
e.g. Google does not use the information.)

Using language markup at other markup levels, e.g. for individual
paragraphs or even words, is rather pointless, sad to say. There isn't
much support worth mentioning. (I use it, but mostly as a matter of
principle, or habit, and not very consistently. Many W3C pages,
including pages that declare that it should be used, don't use it. Most
web pages don't even make a try, so what motivation is there for
software developers to support it?)

That's the big picture. In details, there's a lot that could be said,
especially about the problems, but this doesn't seem to be an
interesting topic to most people. However, mostly for "academic"
interest, I'll comment on your specific issues:

> I am in the process of reviewing a number of web documents that
> feature, in part, a fair bit of "old Latin" (circa 13th century -
> it's a cool academic project).

I took "old" Latin as referring to pre-classic Latin... Anyway, there's
no useful standardized way to distinguish between different forms of
Latin in language codes. You could use country codes, e.g. "la-GB" to
refer to Latin as used in the United Kingdom, but this would be
anachronistic for 13th century language and also useless.

> At any rate, W3C guidance states
> "Clearly identify changes in the natural language of a document's
> text and any text equivalents (e.g., captions)."

I'm afraid nobody, including the W3C, takes that seriously. It's just
too much trouble with little if any tangible benefit. It's based on
theoretical ideas - largely, law, poorly analyzed ideas - on the
_possible_ usefuless of language markup, rather than actual experience.

> *AND* the ISO code
> for Latin is either "LA" (ISO 639-1) or "LAT" (ISO 639-2) so clearly
> this *CAN* be done.

The technically correct language code for use in markup is "la", with
lowercase as the recommended spelling. HTML and XML specifications refer
to specifications that mandate the use of two-letter codes for languages
that have one.

> As well, wikipedia suggests that "Screen readers without Unicode
> support will read a character outside Latin-1 as a question mark,

Character support is a different issue and should not depend on language
markup, and mostly doesn't.

Generally, in special software like screen readers or specialized
browsers, we should expect character support to be more restricted than
in common modern browsers. Even Latin-1 isn't as safe as in "normal"
browsing. For example, what would a screen reader do upon encountering a
special character like " ¶"? Would it recognize it as having a special
meaning (paragraph separator) and make a pause? Hardly. It probably
spells it out. This might mean saying "pilcrow sign", perhaps
independently of language being used (since characters names aren't
widely localized - most characters don't even _have_ a name in most
languages), which might be complete gibberish even to people who
understand normal English.

> The question is, is there any real advantage gained by adding this
> information (lang="lat") to the content?

Very little if at all. But if used, it should be lang="la".

> I am at a loss to explain any real value
> in doing it to the client as at the end of the day I cannot myself
> find a "real justification" that would improve the accessibility of
> the document.

The best explanation that I could use (if someone offered to pay me for
adding such markup and I needed to soup up "internal" and "moral"
motivation) is the following (and it's lame, so this tells a lot):

If a user opens your HTML page in a word processor like Microsoft Word,
it will use the language markup, and this can be relevant when spelling
checks are "on", i.e. words classified as misspelled are highlighted.
Declaring Latin words as Latin prevents the program from applying
English spelling rules to them. (The copy of Word I just tested seems to
be Latin-ignorant. That is, it recognizes the words being in Latin but
does not flag anything as misspelled and does not even hyphenate Latin
words. But even this is probably better than treating them as English or
some other language.)

On some browsers, like Firefox, the user can right-click on a word and
get information about its language. Sometimes it is useful to know that
a word is Latin. (But what are the odds that a user knows about such
functionality?)

Style sheets, either page or user style sheets, could be used to style
words in a particular language as different from others, using a
selector like [lang="la"] or :lang(la). However, this does not work e.g.
on IE 6, which does not recognize such selectors.

Moreover, some day some browsers or other software could make real use
of the markup.

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

From: Benjamin Hawkes-Lewis
Date: Fri, Apr 25 2008 2:00AM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

John Foliot - Stanford Online Accessibility Program wrote:
> As far as I know, current screen reading technology only supports a limited
> number of languages.

[snip]

> The question is, is there any real advantage gained by adding this
> information (lang="lat") to the content?

Even if there were no speech synthesis available for a language, screen
readers like JAWS can announce language changes and users can associate
particular voice configurations with particular languages.

As it happens, it looks like Classical Latin is among the MBROLA voices:

http://tcts.fpms.ac.be/synthesis/mbrola.html

It is therefore (at least theoretically) usable with at least some
screen readers and text-to-speech software, e.g. NVDA, FreeTTS (used by
FireVox), and Emacspeak:

http://www.nvda.fr/spip.php?article14

http://mambo.ucsc.edu/psl/mbrola/

http://web.mit.edu/ATIC/src/emacspeak-9.0/mbrola

--
Benjamin Hawkes-Lewis

From: Benjamin Hawkes-Lewis
Date: Fri, Apr 25 2008 2:10AM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

Jukka K. Korpela wrote:
> Using language markup at other markup levels, e.g. for individual
> paragraphs or even words, is rather pointless, sad to say. There isn't
> much support worth mentioning. (I use it, but mostly as a matter of
> principle, or habit, and not very consistently. Many W3C pages,
> including pages that declare that it should be used, don't use it. Most
> web pages don't even make a try, so what motivation is there for
> software developers to support it?)

Software developers /do/ support it. JAWS, for example, can switch
voices inline based on the LANG attribute.

> For example, what would a screen reader do upon encountering a
> special character like " ¶"?

Depends on its configuration.

> Style sheets, either page or user style sheets, could be used to style
> words in a particular language as different from others, using a
> selector like [lang="la"] or :lang(la). However, this does not work e.g.
> on IE 6, which does not recognize such selectors.

If you're going the trouble of adding lang attributes, you could add
class attributes for IE6 backwards compatibility at the same time.

--
Benjamin Hawkes-Lewis

From: Moore, Michael
Date: Fri, Apr 25 2008 6:50AM
Subject: Re: Lang attribute and "old" latin
← Previous message | Next message →

John,

You can justify using correct markup on three grounds:

1. WCAG requirement.
2. Forward compatibility for screen readers, even if the characters are
not currently supported.
3. There is a sizable cottage industry of JAWS scripters who could add
support for the characters even if it is not available natively. Given
that this is an academic research project it is conceivable that a blind
researcher may wish to tap into this scripting resource, to add the
capability if documents exist that would become more accessible with the
investment.

The university could even complete the scripting, good cs class project,
and the JCF file could be made available with the document.

Mike


>>> "John Foliot - Stanford Online Accessibility Program"
>>> < = EMAIL ADDRESS REMOVED = > 4/24/2008 7:04 PM >>>

All,

As far as I know, current screen reading technology only supports a
limited number of languages.

I am in the process of reviewing a number of web documents that feature,
in part, a fair bit of "old Latin" (circa 13th century - it's a cool
academic project). At any rate, W3C guidance states "Clearly identify
changes in the natural language of a document's text and any text
equivalents (e.g., captions)." *AND* the ISO code for Latin is either
"LA" (ISO 639-1) or "LAT"
(ISO 639-2) so clearly this *CAN* be done.

As well, wikipedia suggests that "Screen readers without Unicode support
will read a character outside Latin-1 as a question mark, and even in
the latest version of JAWS, the most popular screen reader, Unicode
characters are very difficult to read." (Is this true, I was not aware
of this. The document often uses &thorn; throughout this old Latin text
- is this going to be an issue?)

The question is, is there any real advantage gained by adding this
information (lang="lat") to the content? It is/would be a huge
undertaking, and if *not* done is pedantically/dogmatically wrong (fails
WCAG P1 4.1), however I am at a loss to explain any real value in doing
it to the client as at the end of the day I cannot myself find a "real
justification" that would improve the accessibility of the document.

Thoughts, arguments (either side) and other support gratefully accepted.

Cheers!

JF

From: Christophe Strobbe
Date: Fri, Apr 25 2008 7:40AM
Subject: Re: Lang attribute and "old" latin
← Previous message | No next message

At 08:11 25/04/2008, Jukka K. Korpela wrote:

>John Foliot wrote:
>
> > As far as I know, current screen reading technology only supports a
> > limited number of languages.
>
>Rather limited, I'm afraid.

It is indeed limited. See also the old thread (April 2005) starting at
<http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0097.html>;.

However, the number of languages supported by, for example, JAWS, is not
limited to the list at
<http://www.freedomscientific.com/fs_products/software_jawsinfo.asp>;.
Local distributors, for example Freedom Scientific Benelux, can deliver a
JAWS version with a speech synthesizer for Dutch.
For a version that supports Latin, I would contact
Freedom Scientific Vatican City ;-)


>Moreover, support to language switching on
>the basis of language markup (lang or xml:lang attributes) is much more
>limited.

In the tests I did with JAWS last year, language switching worked with
lang, but xml:lang was ignored.
Language subcodes may not work as expected in some screen readers
(based on my tests with JAWS; I tried to collect data for other screen
readers, without success; see
<http://lists.w3.org/Archives/Public/w3c-wai-ig/2008JanMar/0041.html>;,
test data are still welcome).


>In practical terms, using language markup at the top level (<html> or
><body> element) is a good move: it takes a very small effort, and it
>helps some people. (But then it should be _correct_. It often isn't, so
>e.g. Google does not use the information.)

Even when the language markup is correct, Google does not
necessarily use that information. I have found webpages in Dutch with
correct language markup that still show up in the results when I
explicitly ask Google to return only pages in English.


>Using language markup at other markup levels, e.g. for individual
>paragraphs or even words, is rather pointless, sad to say. There isn't
>much support worth mentioning. (I use it, but mostly as a matter of
>principle, or habit, and not very consistently. Many W3C pages,
>including pages that declare that it should be used, don't use it. Most
>web pages don't even make a try, so what motivation is there for
>software developers to support it?)

What is the threshold for "not much support"?
Using the same threshold, one might arrive at the conclusion that
the percentage of screen reader users is so low that there is
"not much need" for markup that benefits screen reader users.
(I'm not accursing anyone on these lists, but see the comments
by some of the anonymous cowards at
<http://www.computerworld.com/comments/node/9077118?page=2>;.)


>That's the big picture. In details, there's a lot that could be said,
>especially about the problems, but this doesn't seem to be an
>interesting topic to most people.

Just like global warming. That doesn't mean it's not important.
(Global warming affects more people than web accessibility,
and still most people don't care enough to change their behaviour.)


>However, mostly for "academic"
>interest, I'll comment on your specific issues:
>(...
> > At any rate, W3C guidance states
> > "Clearly identify changes in the natural language of a document's
> > text and any text equivalents (e.g., captions)."
>
>I'm afraid nobody, including the W3C, takes that seriously. It's just
>too much trouble with little if any tangible benefit. It's based on
>theoretical ideas - largely, law, poorly analyzed ideas - on the
>_possible_ usefuless of language markup, rather than actual experience.

I guess Online Video Killed the Accessibility Star.
Most tutorials on captioning are in English and all too many accessibility
tutorials in English (on captioning or any other subject) pretend that
all documents are monolingual. By extension, they assume the same for video.
(Some captioning formats actually have codes for language switching,
but if you don't know where to look, you can waste a lot of time
searching for that information.)


>(...)

Best regards,

Christophe


---
Please don't invite me to LinkedIn, Facebook, Quechup or other
"social networks". You may have agreed to their "privacy policy", but
I haven't.

--
Christophe Strobbe
K.U.Leuven - Dept. of Electrical Engineering - SCD
Research Group on Document Architectures
Kasteelpark Arenberg 10 bus 2442
B-3001 Leuven-Heverlee
BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm