​​Dear list,


I am trying to find a way of encoding different linguistic annotations from several sources in the same document. Different NLP tools would annotate grammatical information and their divisions could be overlapping; for example, a tool could analyze a sentence with three words like the two first ones belong to a phrase (<s><phr>token1 token2</phr><phr>token3</phr></s>), while the second could come to the conclusion that the two last words belong to the same phrase (<s><phr>token1</phr><phr>token2 token3</phr></s>). I thought about using the elements choice, orig and reg for that, although I doubt that was the purpose for the reg element. An example:

<s>
    <choice>
        <orig>token1 token2 token3</orig>
        <reg resp="tool1"><phr>token1 token2</phr> <phr>token3</phr></reg>
        <reg resp="tool2"><phr>token1</phr> <phr>token2 token3</phr></reg>
    </choice>
</s>

Is there a better element? Should I use another strategy? I would like to maintain  text and annotation close, so the evaluation is easier.
Best regards from Würzburg,
José Calvo