On Wed, 24 Apr 2002, Charles Muller wrote:
> Before I began to try to apply TEI in a strict sense to the
> composition of articles and so forth, I had been tagging the Chinese,
> Japanese, and Korean characters in my text with <cjk></cjk>, so that I
> might have some control as to whether or not I want to display them in
> a given output, and with which font.
Were you using the <cjk> element to
delimit material that might be in any one or more of the languages named?
Like this: english words <cjk>korean-word japanese-word</cjk> ?
Or like this: english words <cjk>korean-word</cjk> <cjk>japanese-word</cjk> ?
The latter would (obviously) be potentially more useful.
> In trying to replicate this in TEI, I have come to the understanding
> that I should be using the tag <writingSystemDeclaration>. Before I go
> ahead and do so, I would like to confirm that this is the correct
> assumption, since apparently this tag requires further children, and
> thus it would be somewhat unwieldy to use this much apparatus to mark
> each occurrence of Chinese (or whatever) in my text.
No, this is not correct. How you tag this depends rather on what else is
going on in the text. If for example the words in CJK are already tagged
for some other purpose, e.g. because they are technical terms (<term>) or
names (<name>), then all you need to do in the text is use the global LANG
attribute to specify that this <term> or <name> is in whichever of
Chinese, Japanese, Korean, or the combined "CJK" language is appropriate.
If the words you wish to mark ion this way are not already tagged, then
you can use the general phrase level element <foreign> to carry the
<p>There is a <term lang="CJK">chinese-japanese-or-korean-word</term>
<p>There is a <foreign
The LANG attribute is declated as IDREF, so there must also somewhere be a
(single) <language> element in the header to your file, or your
collection of files, which defines that the code CJK is associated with
the language "chinese-japanese-or-korean".
<language id="CJK">Chinese, Japanese, or Korean</language>
<!-- possibly other language codes defined here -->
You can also define other things about like the language and its
representation in your encoded text -- that is currently done by means of
the writing system declaration (WSD) to which you refer, and which may be
associated with the <language> element in the header. If (as I assume) you
are using Unicode to represent these characters however, there is no need
to define a WSD, and the new P4 dtds do not require them in any case.
See further the chapter on character sets and encoding at
> Am I correct in assuming that this is the right tag? Or is there a
> simpler way of doing this kind of markup? Any advice on this
> application would be appreciated.
Well, I hope you agree that what I sketched out above is a bit simpler...
> Charles Muller
> Toyo Gakuen University
> Digital Dictionary of Buddhism and CJK-English Dictionary