An ultra quick reply (running AFK now):
On 06.04.2011 12:41, Lou Burnard wrote:
> Well, you have correectly anticipated my reaction!
> a. TMX may not care about this ambiguity (though that seems surprising)
> but in any case that is no reason for the TEI to introduce it
> b. TMX may be a significant player, but W3C is a slightly larger one I feel
> c. therefore we should not lightly change the semantics of xml:lang
I am not sure if W3C is a side here. The usage of @xml:lang I presented
is a consequence of the stand-off approach. This approach predates XML
(being publicly born in 1997 at the latest), and anyway XML was created
as a skeletal standard open to refinements. Those attributes in the OCTC
fulfil the role prescribed by the standard: they label/identify the
content of the element. The content is virtual, it's referenced remotely
-- well, such is life, it still counts as element content. By denying
the use of xml:lang in such contexts one would be rejecting stand-off
And now let me add: those attributes fulfil the role prescribed by the
standard, *with the extra function* of labelling the container as well
(this is what TMX did and I followed). That is an extension of
semantics, rather natural, I would argue. I don't think it's
revolutionary. It's just worth being aware of.
> Bad practice does not make good standards.
> On 06/04/11 11:35, Piotr Bański wrote:
>> Hi Lou,
>> I completely agree with the sentiment, and mentioned this partially with
>> our TEI-MM-2010 discussion in mind, as an argument in favour of your
>> stance ("use<link>!"), however:
>>> It introduces a serious and unnecessary ambiguity.
>> and let me also add: natural. Ambiguity between the content and the
>> container is one of the most natural metonymic developments. It was an
>> ambiguity waiting to happen. (it started from labelling the content; in
>> my example we saw it labelling the virtual container)
>> It does not *introduce* the ambiguity. That has been done in the TMX
>> (Translation Memory eXchange) standard. What I did in the alignment part
>> of the OCTC was take TMX and standoffize it.
>> Of course, I am not presenting this as an argument that bad things are
>> justified if others do them as well. But this 'bad thing' is a natural
>> strategy that follows from the interaction of legitimate techniques (TMX
>> markup + stand-off markup). And we're talking about the world of
>> standards that crucially depend on the numbers of worshippers (analogy:
>> Terry Pratchett's gods). Here we have it: TMX already uses this nasty
>> ambiguity, and TMX is no small god. The question now becomes: can we
>> afford to be purist here. I'm not sure I want to try.
>> Of course, I can easily avoid the Bad Thing by using attributes other
>> than xml:lang, with the very same content. But I'm sure that then corpus
>> linguists will eventually ask me where my mind was when I devised a new
>> attribute, failing to notice a perfectly good xml:lang attribute that
>> could ('should', they would say) be used instead.
>> On 06.04.2011 10:57, Lou Burnard wrote:
>>> On 05/04/11 23:41, Piotr Bański wrote:
>>>> <ptr xml:id="pol-swh_aln_2.1.2-ptr"
>>>> target="swh/UDHR/text.xml#swh_txt_1-head" type="tuv" xml:lang="sw"/>
>>>> it's worth pointing out that @xml:lang plays an
>>>> unlicensed semantic role here, of which the creator of this markup
>>>> should probably be ashamed, but which may be taken to indicate the
>>>> near-natural-language-like re/ab-use (exaptation?) of markup language
>>>> constructs originally designed to perform a different role.
>>> However you call it, it is WRONG. It introduces a serious and
>>> unnecessary ambiguity. If I had my way I would actually ban the use of
>>> xml:lang on empty elements like<ptr> for precisely this reason (cf my
>>> earlier posting in response to Felix)