You're right about the scope of the xml:lang attribute, and you're also
right to be worried about whether IPA is a script. Having read RFC
4646, which I didn't know about (and which is very informative), I can
assure you that it isn't.
So, the correct way to tag French written in IPA is <q
xml:lang="fr-fonipa"> -- which is what we wanted to know in the first
place. Note that this is part of the standard, not private use. If
it's French as spoken in Canada written in IPA, then it's <q
>>> On 23/10/2007 at 11:23, in message
<[log in to unmask]>, Syd Bauman
<[log in to unmask]> wrote:
> Hi Ralph! Will we get to see you in College Park?
>> What's wrong with it is that it isn't a language, which is what the
>> attribute is intended for.
> Well, yes it's not a language, but the attribute is intended for
> this, too, I think. For better or for worse (and there are those who
> feel it is for worse), the TEI uses the xml:lang= attribute, which
> uses as its value 'tags' (not to be confused with XML tags) taken
> from IETF BCP 47. This system includes provision for indicating with
> which script a language is written.
> Example: "sr-Latn" represents Serbian written using the Latin
> script. [RFC 4646, sect 2.2.1] 
> The script is indicated with a 4-letter subtag that occurs after the
> primary language tag, the extended language subtag if any (there are
> none yet), and the region subtag. So to indicate Serbian as spoken
> the United States written using the latin script, one would use
> Note that this information is machine processable. The script subtag
> is always 4 letters long; the region subtag is always either 2
> letters or 3 digits long. This allows the information to be parsed
> out of the tag predictably. Note that CSS2 even has a selector
> specifically designed to make it easy to match against primary
> language subtag (and maybe primary language + script, or primary
> language + region, or primary language + script + region, whatever,
> but only from left-to-right -- dunno, I've never tried :-).
> [att|=val]: Match when the element's "att" attribute value is a
> hyphen-separated list of "words", beginning with "val". The match
> always starts at the beginning of the attribute value. This is
> primarily intended to allow language subcode matches (e.g., the
> "lang" attribute in HTML) as described in RFC 1766 ([RFC1766]).
> [CSS2, sect 5.8.1] 
> The big question in my mind is whether or not IPA can be considered
> script or not. I don't know enough about it to say either way, so I
> am going to follow along with the rest of the posters to this list,
> and presume that it is reasonable to consider IPA a script. It makes
> me a little uncomfortable, though, because I immediately wonder why
> there isn't an ISO 15924 code for IPA.
> So there is nothing outright wrong (IMHO) with
> <q xml:lang="fr-x-ipa">
> HOWEVER, that said, if we presume IPA is a script, there is a better
> mechanism. The 4-letter script subtag already includes the
> for the expression of private-use codes internally. Any value in the
> range "Qaaa" to "Qabx" is not defined by ISO 15924, and is reserved
> by ISO 15924 (and thus RFC 4646 and thus BCP 47 and thus TEI)
> for private use. Thus, the "x-" mechanism is not needed. so I think
> <q xml:lang="fr-Qabp">
> is better.
> Note that, as with any other use of a private-use subtag, this
> be defined in the TEI header in order for the document to be TEI
> conformant. (And currently the schema will not give you an error
> message if you forget.) So you would have something like the
> following in the <profileDesc>:
> <language ident="fr-Qabp">Standard French, written using the
> International Phonetic Alphabet.</language>
>  RFC 4646 is the successor to RFC 3066, which itself is the
> successor to RFC 1766. RFC 4646 is currently part of BCP 47.
> "BCP" stands for "best current practice".