E-mail List Archives

Re: Chinese/Japanese/Korean names and their romanizations in aFrench article

for

From: Jukka K. Korpela
Date: Mar 25, 2008 11:20AM


Pierre wrote:

> I would
> like to store the "romanized" version of those names in the HTML
> document, so readers who can't understand Chinese/Japanese/Korean >
> can still have an idea of the name.

It's surely advisable to include romanized text, since most people who
know French don't understand CJK characters and would find them
particularly difficult, if not alienating, since they bear no
resemblance to characters that they know.

On such grounds, I think you should used romanized forms in the primary
text and include the CJK form inside parentheses, rather than vice
versa. You should indicate the romanization system used, since there are
several systems in use; when using pinyin, linking to
http://www.pinyin.info might be a good idea (so that you won't need to
explain the details yourself).

Consider using tone marks, too, in the romanization. They may help
readers, and they should not be too distracting. The main problem with
them is technical: if you use diacritic marks (on vowels) as tone marks,
then some of the letter+diacritic combinations might not be present in
the font in use, or some assistive software might be unable to process
them properly.

The parentheses aren't really necessary for any formal reason, since the
style of CJK characters distinguishes them from the text in Latin
letters. However, parentheses carry the suggestion that the
parenthesized text is, well, parenthetical, i.e. that it is usually not
essential for understanding the main content. In a sense, they are
promise: you may ignore whatever is inside parens, and you need not
panic just because there are mysterious characters there.

> I thought about using HTML tags such as abbr, acronym or dfn,

None of them is adequate for a transcribed form or for an original form.
There is no semantic HTML markup for such a purpose. Use <span> if you
need to turn such text into an element for styling or other purposes.

> and then
> use the title and lang attributes to display the romanization and the
> language it comes from.

Forget it. The romanized form is far too important to be left to depend
on browser features. It should be in the content, not hidden in
attributes. And remember that lang selectors don't work on IE 6.

> I could display the romanized version between
> brackets when the article is printed, and use it as a "tooltip" when
> the article is read online.

That's unnecessarily unsafe and complicated.

> What would be the best method to use in order to display such names in
> a French text and to keep "readability" thanks to the romanized
> versions of the characters?

For _readability_, you would use Latin letters only, but this might not
be feasible. Is there a reason to present the original CJK form? What
will users benefit from it? If there is some real gain, it probably
means that the CJK text should be part of the content, inline. But
sometimes you might "hide" it behind a link, like

<a href="#wubai">Wu Bai</a>

and you would have somewhere an element with id="wubai" that presents
the CJK form and possibly also the complete romanization, with tone
marks. Something like

<p id="wubai">Wu Bai (Chinese: 伍佰; pinyin: Wǔ Bǎi; Taiwanese Minnan:
Gō·-pah), born 14 January 1968) is the stage name of a rock singer from
Taiwan, Wu Chun-lin (Chinese: 吳俊霖; pinyin: Wǔ Jǔnlín; Taiwanese: Ngô·
Chùn-lîm).</p>

I don't know about those facts. I just copied from Wikipedia. You might
even consider just linking to Wikipedia

<a href="http://en.wikipedia.org/wiki/Wu_Bai">Wu Bai</a>

when the Wikipedia article contains the CJK expression, but
1) it might be better to include the CJK form in your own document
2) Wikipedia is inherently unstable and unreliable.

> I suppose I shouldn't use abbr nor acronym because of their original
> meaning... what about dfn?

Those markup elements are best forgotten. Their definitions are sloppy,
and there's not much browser support worth mentioning, and some of the
"support" is just confusing (like dotted underline).

> I heard about a ruby tag <http://www.w3.org/TR/1998/WD-ruby-19981221/>;
> but it seems it's not implemented in any "classical" browsers
> (Firefox, Opera, Internet Explorer) the way I'd like to use it...

IE has a working, though limited, support to Ruby, and as others have
remarked, Ruby has been designed to "degrade gracefully" on
non-supporting browsers, provided that an author uses correct markup.

But this isn't really a job for Ruby, for several reasons. To begin
with, Ruby text is (on IE) by default very small, and although you can
usually change this with CSS, what would you actually do? You don't want
gross line spacing, do you?

Ruby might be interesting for _some_ purposes even outside its original
scope, but it's not for normal transcriptions.

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

_______________________________________________
To manage your subscription, visit http://list.webaim.org/
Address list messages to <EMAIL REMOVED>