As a matter of fact, in the NLP domain people implicitly (or explicitly) use the
lemma information as a reference to an entry in a dictionary. In a way, this is
the only theoretically sound definition of a lemma.
So I would vote for d).
Selon Piotr Banski <[log in to unmask]>:
> In my previous mail, when I referred to the data.word constraint as
> possibly casual, I did not realise that in fact it comes from something
> specific to the medium, that is the
> no-text-where-you-may-fail-to-properly-handle-it restriction of XML/TEI.
> So FWIW, I understand the need to restrict the 'lemma' attribute, and
> withdraw my implicit vote on option (c) below. Option (d) sounds very
> sensible in this context, cause data.word does only half the job --
> after all, you may run into problems encoding even a single-word exotic
> I found a nice attribute datatype list at
> -- is it still current?
> Daniel O'Donnell wrote:
> > On Thu, 2007-05-03 at 17:16 +0100, Lou Burnard wrote:
> >> The choice on which I asked for Guidance from the Council (and now ask
> >> the TEI-L readership more generally) is whether we should
> >> (a) continue with the existing system
> >> (b) *remove* the @lemma attribute in favour of a <lemma> child
> >> (c) redefine the @lemma attribute to use a different datatype which does
> >> permit included spaces
> > Or d) lemma information should not necessarily be encoded in the text
> > stream but encoded elsewhere and pointed at using a reference.