As I have mentioned in one of my previous mails, we are working
on the Integrated Language Database of 8th-21st Century Dutch, in
which dictionaries, lexica and a diachronic corpus will be linked.
The corpus will be tei-encoded. We are now developing our minimal
One of the problems we have encountered is the butter-23fly
example, i.e. the first half of butterfly being on page 22 and the
second half on p. 23.
We would like to encode instances like these as follows:
<reg orig='butter- 23 fly>butterfly <pb n="23"></reg>
since we also intend to morphologically tag the entire corpus,
preferably fully automatically, and for that a complete wordform
presents less complications.
Are there any objections to this solution?
Katrien A.C. Depuydt
Instituut voor Nederlandse Lexicologie
(Institute for Dutch Lexicology)
(editor Dutch Language Database)
e-mail: [log in to unmask]
tel.: +31 71 5272479
NL-2300 RA Leiden