I work on a corpus which I grammatically annotated with a categorisor.

Ex : from this sentence :
En traçant aujourd'hui sur le papier la première de ces lignes de prose…

 I obtained a file which is presented as follows:


Initially, I simplified my file by keeping only the grammatical categories, such that they are presented in a dictionary, i.e, substantive, verb.

Then, my TEI file became :


<w type="prep">En</w>
<w type="v">traçant</w>
<w type="adv">aujourd'hui</w>
<w type="prep">sur</w>
<w type="art">le</w>
<w type="n">papier</w>
<w type="art">la</w>
<w type="n">première</w>
<w type="prep">de</w>
<w type="adjDem">ces</w>
<w type="n">lignes</w>
<w type="prep">de</w>
<w type="n">prose</w>

…Then I can use Xaira to index and question my file. All is alright  :-) , but in this operation...I impoverished my categorization thus since I removed the kind and the number as well as the mentions of the verbal paradigms (time, mood, person).

Is there a solution to keep a rich categorisation in my file?


<NCMS>papier</NCMS>(M=masculin, S=singulier)

***I will propose something like  this :

 <w type="art" genre=" M" number="S">le</w>
For :
<VPARPRES>traçant</VPARPRES>(PARPRES= present particip)
**I will propose something like  this :

<w type="v" mood="PARPRES">traçant</w>


Does thes solutions seem to you correct?


Jean-Luc Benoit 
 44, avenue de la Libération , B.P. 30687, F 54063 NANCY CEDEX
 03 83 96 86 99
 [log in to unmask] | ATILF(