Dear all,

I'm working in a project with a strong lexicographical component so we 
are lemmatizing all the words. For this purpose we are using:

<w lemma="">word</w>

but we are in trouble with multiword expressions (e.g. "in primis").
 From a lexicographical point of view it is matter of a single entry 
(separating the expression in "in" and "primis" is simply nonsensical).  
The problem is that

<w lemma="in primis">in primis</w>

is not valid as the lemma definition is

     <attDef ident="lemma" mode="change">
        <desc>identifies the word's lemma (dictionary entry form).</desc>
        <datatype minOccurs="1" maxOccurs="1">
           <rng:ref xmlns:rng="" 

I can modify the definition, but I was thinking that my problem can be 
rather common (for instance, Italian language contains thousands of 
multiword expressions...) and would like to submit the question to 



Elena Pierazzo
Associate Researcher
Centre for Computing in the Humanities
King's College London
Kay House 7 Arundel St
London WC2R 3DX

Phone: 0207-848-1949
Fax: 0207-848-2980