Marcus Bingenheimer <[log in to unmask]> writes:
> Robert Cover has pointed out what he thinks is an ambiguity in the W3C
> definition of xml:lang. He also mentions TEI. Has his criticism been
> answered? Is it valid?
This issue, and Robin Covers argument have indeed been considered and
discussed by the WG on Languages and Character issues at length,
although maybe not in its latest form. The WG decided to endorse
xml:lang to increase interoperability among differing standards and
XML applications, but on the premise that an overhaul of the TEI CDATA
attributes, which was due for other reasons as well, would basically
do away with attributes that could be considered to be in a specific
language.
> Example #1: The TEI (P4) DTD defines a <q> element for quoted
> speech; this element has two CDATA attributes ('who' and 'type') as
> well as an enumerated-type attribute 'direct' with attribute type and
> default value (y | n | unspecified) "unspecified". Using the TEI P4
> 'lang' attribute (a global IDREF attribute indicating the language,
> writing system, and character set associated with a given element), the
> following <q>...</q> encoding would be sensible for an English-speaking
> student wishing to mark up a German quoted phrase: <q lang="de"
> who="Hans" type="spoken" direct="unspecified">bei mir</q>. The
> following would not: <q xml:lang="de" who="Hans" type="spoken"
> direct="unspecified">bei mir</q>. The prescribed meaning of xml:lang
> seems to require a declaration that the terms "spoken" and
> "unspecified" (at least) are in German, as well as "bei mir." This is
> not a boundary case, as the TEI DTD has dozens or maybe hundreds of
> CDATA attributes which invite substrings "in" the native language of
> the encoder, which would conflict with the semantic for xml:lang in a
> bilingual or multilingual encoding environment. It is unclear how the
> TEI editors could entertain a proposal to substitute the xml:lang
> attribute of XML 1.0 for the TEI P4 lang attribute in the P4 XML DTD,
> given the scope specification for xml:lang. </q>
In our discussions, we noted that the above use of "spoken" as a value
of the type attribute should not be considered to be an English word,
but rather should be taken as a token that represents a certain item
in some typology. The same holds true obviously for cases such as the
direct attribute, where a predefined list includes the seemingly
English token "unspecified".
More to the point though, TEI lang does not really adress the problem
either, since it does not allow to specify the "language used for
markup purposes", e.g. as attribute values separately from the
"language of the text encoded". For that reason, the decision was
taken (though I think not yet sufficiently implemented) to move those
attribute values that do require a language declaration to elements.
All the best,
Christian Wittern
|