Dear all,

I'm working in a project with a strong lexicographical component so we are lemmatizing all the words. For this purpose we are using:

<w lemma="">word</w>

but we are in trouble with multiword expressions (e.g. "in primis").
From a lexicographical point of view it is matter of a single entry (separating the expression in "in" and "primis" is simply nonsensical).  The problem is that

<w lemma="in primis">in primis</w>

is not valid as the lemma definition is

     <attDef ident="lemma" mode="change">
        <desc>identifies the word's lemma (dictionary entry form).</desc>
        <datatype minOccurs="1" maxOccurs="1">
           <rng:ref xmlns:rng="" name="data.word"/>

I can modify the definition, but I was thinking that my problem can be rather common (for instance, Italian language contains thousands of multiword expressions...) and would like to submit the question to everybody.



Elena Pierazzo
Associate Researcher
Centre for Computing in the Humanities
King's College London
Kay House 7 Arundel St
London WC2R 3DX

Phone: 0207-848-1949
Fax: 0207-848-2980