Print

Print


Dear all,

I'm working in a project with a strong lexicographical component so we 
are lemmatizing all the words. For this purpose we are using:

<w lemma="">word</w>

but we are in trouble with multiword expressions (e.g. "in primis").
 From a lexicographical point of view it is matter of a single entry 
(separating the expression in "in" and "primis" is simply nonsensical).  
The problem is that

<w lemma="in primis">in primis</w>

is not valid as the lemma definition is

<attList>
     <attDef ident="lemma" mode="change">
        <desc>identifies the word's lemma (dictionary entry form).</desc>
        <datatype minOccurs="1" maxOccurs="1">
           <rng:ref xmlns:rng="http://relaxng.org/ns/structure/1.0" 
name="data.word"/>
        </datatype>
     ...
     </attDef>
</attList>


I can modify the definition, but I was thinking that my problem can be 
rather common (for instance, Italian language contains thousands of 
multiword expressions...) and would like to submit the question to 
everybody.

Bests

Elena



-- 
Elena Pierazzo
Associate Researcher
Centre for Computing in the Humanities
King's College London
Kay House 7 Arundel St
London WC2R 3DX

Phone: 0207-848-1949
Fax: 0207-848-2980