Print

Print


​​Dear list,


I am trying to find a way of encoding different linguistic annotations 
from several sources in the same document. Different NLP tools would 
annotate grammatical information and their divisions could be 
overlapping; for example, a tool could analyze a sentence with three 
words like the two first ones belong to a phrase (<s><phr>token1 
token2</phr><phr>token3</phr></s>), while the second could come to the 
conclusion that the two last words belong to the same phrase 
(<s><phr>token1</phr><phr>token2 token3</phr></s>). I thought about 
using the elements choice, orig and reg for that, although I doubt that 
was the purpose for the reg element. An example:

<s>
     <choice>
         <orig>token1 token2 token3</orig>
         <reg resp="tool1"><phr>token1 token2</phr> <phr>token3</phr></reg>
         <reg resp="tool2"><phr>token1</phr> <phr>token2 token3</phr></reg>
     </choice>
</s>

Is there a better element? Should I use another strategy? I would like 
to maintain  text and annotation close, so the evaluation is easier.
Best regards from Würzburg,
José Calvo