We have a very similar problem, which so far we have tried not to think
about too much, but one day we will want or have to deal with it.
We identify stretches of text that we annotate using xpath expressions
What we now do in order to guard against changes in the annotated file
is to store a checksum of the annotated file with the annotations. We
also store (part of) the annotated text with the annotations. This gives
us another way of verifying the integrity of the annotated file.
In the future we'll want to be able to deal with changes in the
annotated file. I do not expect this to be a very frequent occurrence;
usually the annotators will be able to work with the older version of
the annotated XML.
I have been thinking of several solutions:
(1) require the presence of id attributes in the annotated XML, and use
these to identify the annotated nodes (will not help in the case of
changes in the text, would severely limit the usefulness of our
(2) write some kind of comparison program for the old and new XML files
and have this program produce some kind of delta file describing the
changes; this delta file should then be the input for another program
that updates the pointers in the annotation file.
I suppose we'll do (2). I'm not happy with it, but I can't really think
of another solution.