On Thu, 2002-04-25 at 03:22, Lou Burnard wrote:
> Were you using the <cjk> element to
> delimit material that might be in any one or more of the languages named?
> Like this: english words <cjk>korean-word japanese-word</cjk> ?
> Or like this: english words <cjk>korean-word</cjk> <cjk>japanese-word</cjk> ?
I am glossing a Chinese, Japanese, or Korean word (term, temple,
personal name, etc.) with its Chinese characters. Thus:
english words <foreign>romanized Japanese, Chinese, or Korean
words</foreign> <cjk>chinese characters</cjk>
english words <term>romanized Japanese, Chinese, or Korean words</term>
> The latter would (obviously) be potentially more useful.
> > In trying to replicate this in TEI, I have come to the understanding
> > that I should be using the tag <writingSystemDeclaration>. Before I go
> > ahead and do so, I would like to confirm that this is the correct
> > assumption, since apparently this tag requires further children, and
> > thus it would be somewhat unwieldy to use this much apparatus to mark
> > each occurrence of Chinese (or whatever) in my text.
> If for example the words in CJK are already tagged
> for some other purpose, e.g. because they are technical terms (<term>) or
> names (<name>),
No, they are not tagged, other than for this purpose.
> If the words you wish to mark ion this way are not already tagged, then
> you can use the general phrase level element <foreign> to carry the
The problem for me in this case would be that I am already using
<foreign> to tag the romanized version of the terms. If possible, I
would prefer not to apply the same tag to the CJK characters, since my
present style sheets output <foreign> as <i> or <em>. If this is most
appropriate way to do it, I guess I could apply the style change at an
attribute level, such as your suggestion of
> <p>There is a <term lang="CJK">chinese-japanese-or-korean-word</term>
but I would prefer to use a different tag, if an appropriate one is
available. Also relevant here is the way I have been using the "lang"
attribute in my dictionaries, where the attribute values of "lang" are
always an ISO 639 value, such as lang="ko", lang="ja", lang="en", etc. I
guess I could make an exception and add CJK as a possible attribute
value, but it would then be getting a bit unsystematic.
> The LANG attribute is declated as IDREF, so there must also somewhere be a
> (single) <language> element in the header to your file, or your
> collection of files, which defines that the code CJK is associated with
> the language "chinese-japanese-or-korean".
I think that what I am doing is a little bit different (but I could well
be mistaken) in that I am not seeking to define any single "language"
per se, since Chinese characters are in this sense not
language-specific, but are being used to gloss Chinese, Japanese,
Korean, Vietnamese, etc., terms. Therefore I had understood that what I
needed to mark up in the text was a change in _writing system_, rather
I am aware of issues of encoding and character sets--I have been doing
virtually all of my work in UTF-8 since 1997, so declaring languages in
this connection is also probably not so directly relevant.
After reading my clarification, do you still feel that the use of
<foreign lang="cjk"> would be the best solution?
Thank you for taking the time to address this.
Toyo Gakuen University
Digital Dictionary of Buddhism and CJK-English Dictionary