2011/4/8 Piotr Bański <[log in to unmask]>
Hi Paul,

Not a vulgar mistake, but a change of perspective of sorts, whereby you
assume that "chapter" and "creed" are unique sequences of characters,
derived from English but used basically in the same way as number-based
values would be, as atomic identifiers of values in a predefined space.

So much for arguing one side, now let me argue the other one: even the
treatment of "chapter" and "creed" as symbolic values doesn't save the
picture, because xml:lang may contain info on the *script* being used.
So if you encode a Cyrillic text and use @xml:lang to indicate that,
"creed" is not a valid value, because it's not in Cyrillic.

This is so very WRONG that "short-sightedness" seems too mild when
thinking of the origin of this rule. OTOH, I wouldn't really ask
questions like "oh gosh, what are we to do now?", because the answer is
that the only way to proceed appears to be to ignore the twisted
attribute-related aspect of xml:lang (much in the spirit of what Lou
wrote). And to possibly mildly watch over the shoulders of XML tool
creators so that they don't get it too deep into their heads to follow
the original Spec literally in this respect. @xml:lang appears to be yet
another thing that is wrong with XML, and we just have to live with it
until something better comes up.

The use case of expressing the language of a given piece of content, independent of the constraints for xml:lang, is important for many XML based vocabularies. The mechanism "langRule" proposed by ITS implements this use case using XPath. See
The Roma services also provides a template "TEI with W3C ITS" including, among others, "langRule" in the TEI header.



Note that this doesn't close the original debate, it's just a way to do
away with the side issue of attributes being put by force into some
language and/or script. Let's just not bother about it, please, it's
counterproductive and it leads us nowhere. Let's *explicitly* ignore it,
because it's wrong and not worth our time or energy.

Of course this issue should be borne in mind in any discussions on the
future shape of XML.



On 08.04.2011 21:08, Paul F. Schaffner wrote:
> So the lesson I should take from this is that I may safely
> convert P3/P4
>   <div lang="wel" type="chapter" n="2"><head>Yr Ail Pennod</head>
>     <div type="creed"><head>Pyngciau'r Ffydd.</head>
> to P5
>   <div xml:lang="wel" type="chapter" n="2"><head>Yr Ail Pennod</head>
>     <div type="creed"><head>Pyngciau'r Ffydd.</head>
> not because the @xml:lang applies only to the element content, but
> because thinking of "chapter" and "creed" as English words (though
> probably a
> valid interpretation in P3/P4) is (in P5) a vulgar mistake?
> Lovely! And they say casuistry is dead.
> pfs
> On Fri, 8 Apr 2011, Lou Burnard wrote:
>> The last message in that thread is also quite useful, as background to
>> my rather flippant comment about the war on attributes
>> ,
>> On 08/04/11 17:00, Piotr Ba?ski wrote:
>>> Thanks for throwing some light on this, Lou,
>>> I've just located a fragment of a past discussion on xml:lang:
>>> (Marcus Bingenheimer's message of 10 Feb 2005)
>>> Well, that's one more thing for us LingSIG people to be aware (and maybe
>>> even wary) of, I'm happy this has come up now.
>>> Best,
>>>    Piotr
>>> On 08.04.2011 12:22, Lou Burnard wrote:
>>>> Yes, the value of xml:lang definitionally specifies the natural
>>>> language
>>>> of all children, including the attributes, of the element that
>>>> carries it.
>>>> Yes, this was an issue which caused some concern in some quarters
>>>> (Espen, are you still there?) when the issue of adopting xml:lang was
>>>> first discussed, during the move to P4.
>>>> In P3 the scope of the @lang attribute is rather ill defined. It
>>>> probably was intended to relate only to the element content, but I am
>>>> not sure that anyone ever thought through the full implications of
>>>> that.
>>>> Certainly it's unclear how exactly you would specify the language for
>>>> one attribute but not another without doubling the number of
>>>> attributes.
>>>> Anyway, one of the consequences of that decision was that the War on
>>>> Attributes promptly broke out, and we moved to the present simpler
>>>> world
>>>> in which attribute values rarely if ever use natural language, so just
>>>> don't have to worry about hyphenation rules, script rules etc. They are
>>>> (mostly) sequences of specific unicode characters to be interpreted as
>>>> symbols only, despite their occasional resemblance to real language
>>>> words (the same might, in passing, be said for the element or attribute
>>>> identifiers)
>>>> As Laurent has already pointed out this really doesn't seem to be a
>>>> major problem. There is full scope for defining and controlling the
>>>> meaning of the symbols used as attribute values in your ODD (using a
>>>> <valList>) and indeed for documenting the language from which you drew
>>>> them.
>>>> It's interesting to note that one of the very first major controversies
>>>> in the TEI concerned whether or not to permit attributes at all. The
>>>> chair of the nascent metalanguage committee in fact resigned over this
>>>> issue in 1989 or thereabouts. I sometimes wonder whether she'd have a
>>>> wry chuckle at the way history has (partially) vindicated her.
>>>> On 08/04/11 07:47, Piotr Ba?ski wrote:
>>>>> Hi Stuart,
>>>>> Half alive after a 15-hour transfer across the Puddle I can't resist
>>>>> mentioning that you've apparently just demonstrated some horrible
>>>>> short-sightedness on the part of the inventor(s) of xml:lang -- how
>>>>> can
>>>>> one force us to at the same time declare the language
>>>>> *unconditionally*
>>>>> for *both* element and attribute content?? Think of dictionaries.
>>>>> Some part of my brain has a memory of something like xml:lang
>>>>> pertaining
>>>>> to element content alone, and of attributes not being addressed by it.
>>>>> This memory is clearly wrong in the light of the recent quote from the
>>>>> XML Spec. But is another memory, of the controversy between switching
>>>>> from using @lang to @xml:lang, not related to that? Was @lang (of P3?)
>>>>> meant for element content alone perhaps? I do hope I am missing
>>>>> something here.
>>>>> Because if what you say is as true as it apparently is, it's not
>>>>> really
>>>>> a matter of Lou being right or wrong, it's a matter of what attribute
>>>>> values you are theoretically allowed to use on any element that
>>>>> contains
>>>>> a string in a language that you want to identify. Your example
>>>>> concerned
>>>>> @n, but isn't the same logic applicable to e.g. @type then? (etc. --
>>>>> even if one tries to wiggle out of my question by saying that @type is
>>>>> symbolic, it doesn't matter because xml:lang may also be about the
>>>>> script, not just the language).
>>>>> Goodnight,
>>>>>     P.
>>>>>> [Sorry if you have already received an email similar to this, I'm
>>>>>> having
>>>>>> email issues at my end.]
>>>>>> I have come to realise that Lou is right about this.
>>>>>> Even in Piotr's minimal case, xml:lang already has a meaning and a
>>>>>> meaning that matters in the real world:
>>>>>> <linkGrp xml:id="...">
>>>>>>     <ptr xml:id="..." target="..." type="..." xml:lang="pl" n="a"/>
>>>>>>     <ptr xml:id="..." target="..." type="..." xml:lang="sw" n="b"/>
>>>>>> </linkGrp>
>>>>>> The language of the @n attributes 'a' and 'b' are determined by their
>>>>>> respective @xml:lang attributes. If systems potentially use @n
>>>>>> attributes for collation or display (as we do at the NZETC), then
>>>>>> language of the @n attributes matters.
>>>>>> Thus, this is not a case where unspecified meaning in the standard
>>>>>> can
>>>>>> be exploited to stash the language of the referent.
>>>>>> cheers
>>>>>> stuart
> --------------------------------------------------------------------
> Paul Schaffner | [log in to unmask] |
> 316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1190
> --------------------------------------------------------------------