As I have mentioned in one of my previous mails,  we are working
on the Integrated Language Database of 8th-21st Century Dutch, in
which dictionaries, lexica and a diachronic corpus will be linked.
The corpus will be tei-encoded. We are now developing our minimal
tagging level.
One of the problems we have encountered is the butter-23fly
example, i.e. the first half of butterfly being on page 22 and the
second half on p. 23.
We would like to encode instances like these as follows:

<reg orig='butter- 23 fly>butterfly <pb n="23"></reg>

since we also intend to morphologically tag the entire corpus,
preferably fully automatically, and for that a complete wordform
presents less complications.
Are there any objections to this solution?


