Dear Martin (cc TEI-L),
>I note your concern that the re-integration of page fragments may be
>difficult. That may be where this project collapses. But the eXist
>function mentioned by Jens Petersen in his response looks interesting.
I note that this eXist function returns a string, which means that
internal markup is lost. Whether that's a problem depends on how much
you're extracting, but since individual words may contain internal markup,
it can be challenging even at that rather granular level.