Daniel O'Donnell wrote:
> If tei:c is syntactic sugar for tei:seg[@type='character'], why can
> tei:c not appear wherever tei:seg appears?
Several distinct strands here, I think (especially when taken in conjunction
with the use-case re runes in the earlier thread) and at the moment I don't
have time to pick up all of them (no: that's not an approaching tornado
rattling your windows, its just the cumulative effect of global
sigh-of-relief heaving from TEI-L readers).
So just something more about <c> (not confined to the <choice> question).
Obviously, anything that can be viewed as a text span within a chunk can be
treated as <seg type="whatever">, but that doesn't mean that <c> is simply
syntactic sugar for the seg + type value form.
<c> is an interesting case, since it provides a clear answer to the question
sometimes posed here: "where the Guidelines and the DTD seem to diverge,
which is authoritative"? Despite the polysemous perils of prose compared to
the lean rigour of EBNF grammar, the answer has to be "the Guidelines" and
<c> is a good example of how and why that's so.
The DTD content model for c is <!ELEMENT c (#PCDATA) >. Whereas the
Guidelines say that a <c> element "should only contain a single character
or an entity that represents a single character". [For those who think such
prose should indeed be as precise as can be managed, that would be better
expressed as "should contain only a single....", but never mind]
Clearly there's a vast difference between the constraint stated in the
Guidelines and what the DTD imposes. All that the DTD can in effect enforce
is that a <c> may not have any element content (or, therefore, any mixed
content) whatever. But #PCDATA could be equally well satisfied either by the
whole of a plain-text copy of War and Peace, or by nothing at all (since the
empty string, an instance of a strongly-typed absence, is also #PCDATA).
neither of which accords with what the Guidelines clearly require. This is
a limitation inherent in all XML DTDs (and which rubs off on to any Schema
that has to be convertible to a DTD without information loss).
Now all those of us who think tagset extension is a fun activity that
everyone ought to enjoy several times a week evolve our sub-chunk level
extension elements via initial deployment of <seg type="mytype">. Those
typed segs which prove to have significant expressive value subsequently
evolve into our extension elements. Whether that process is merely syntax
sugaring is, however, open to doubt in at least some cases. In particular,
let's suppose there were no <c> on offer in the set, but we were feeling our
way towards the need for just such an element. In that case a <c> in spe
marked up as <seg type="character"> (and assuming we meant by that "single
character, in the <c> sense) would be using the type attribute's value, not
so much to assign a taxonomic category as to express a content restraint.
And I would tend to regard that as a precursor of attribute abuse, if not
actually already abusive in itself, and so would want to hasten towards
creating that <c> element where I could express the essential constraint I
had in mind, if not fully in the DTD, at least in a description applied to
that tag alone. But once that is done, the essential characteristics and
roles assigned to <c> mean that it is different from its "cousins" in rather
important ways, and though it could dress up like them if it had to attend a
<seg> party, the regression to its childhood costume would be only
None of this is, in itself, an argument for not allowing <c>, either via a
canonical class or an extension, within the content model of P5 <choice>.
But it does, I hope, indicate why I'd say that, whatever arguments may be
adduced in favour of allowing <c> within <choice>, the line that says "after
all, it's just a sugared seg" is not a particularly good one.