Print

Print


Hi Ralph! Will we get to see you in College Park?


> What's wrong with it is that it isn't a language, which is what the
> attribute is intended for. 

Well, yes it's not a language, but the attribute is intended for
this, too, I think. For better or for worse (and there are those who
feel it is for worse), the TEI uses the xml:lang= attribute, which
uses as its value 'tags' (not to be confused with XML tags) taken
from IETF BCP 47. This system includes provision for indicating with
which script a language is written.
   Example: "sr-Latn" represents Serbian written using the Latin
   script. [RFC 4646, sect 2.2.1] [1]

The script is indicated with a 4-letter subtag that occurs after the
primary language tag, the extended language subtag if any (there are
none yet), and the region subtag. So to indicate Serbian as spoken in
the United States written using the latin script, one would use
"sr-Latn-US".

Note that this information is machine processable. The script subtag
is always 4 letters long; the region subtag is always either 2
letters or 3 digits long. This allows the information to be parsed
out of the tag predictably. Note that CSS2 even has a selector
specifically designed to make it easy to match against primary
language subtag (and maybe primary language + script, or primary
language + region, or primary language + script + region, whatever,
but only from left-to-right -- dunno, I've never tried :-).
   [att|=val]: Match when the element's "att" attribute value is a
   hyphen-separated list of "words", beginning with "val". The match
   always starts at the beginning of the attribute value. This is
   primarily intended to allow language subcode matches (e.g., the
   "lang" attribute in HTML) as described in RFC 1766 ([RFC1766]).
      [CSS2, sect 5.8.1] [1]

The big question in my mind is whether or not IPA can be considered a
script or not. I don't know enough about it to say either way, so I
am going to follow along with the rest of the posters to this list,
and presume that it is reasonable to consider IPA a script. It makes
me a little uncomfortable, though, because I immediately wonder why
there isn't an ISO 15924 code for IPA.

So there is nothing outright wrong (IMHO) with 
  <q xml:lang="fr-x-ipa">
HOWEVER, that said, if we presume IPA is a script, there is a better
mechanism. The 4-letter script subtag already includes the capability
for the expression of private-use codes internally. Any value in the
range "Qaaa" to "Qabx" is not defined by ISO 15924, and is reserved
by ISO 15924 (and thus RFC 4646 and thus BCP 47 and thus TEI)
for private use. Thus, the "x-" mechanism is not needed. so I think
  <q xml:lang="fr-Qabp">
is better. 

Note that, as with any other use of a private-use subtag, this *must*
be defined in the TEI header in order for the document to be TEI
conformant. (And currently the schema will not give you an error
message if you forget.) So you would have something like the
following in the <profileDesc>:
  <langUsage>
     <language ident="fr-Qabp">Standard French, written using the 
       International Phonetic Alphabet.</language>
  </langUsage>

Note
----
[1] RFC 4646 is the successor to RFC 3066, which itself is the
    successor to RFC 1766. RFC 4646 is currently part of BCP 47.
    "BCP" stands for "best current practice".