> I've had a bit of a bee in my bonnet (sorry!) about this issue for a
> couple of years. My recommendation would be to do the opposite. That
> is encode morphological and linguistic information as attributes, and
> graphic information about layout as tagged material. This could be
> This goes against the TEI guidelines in one respect--words are seen as
> linguistic not graphic unities in TEI. But it fits Lou's admonitions
> against putting what is essentially tagging inside an attribute (another
> problem with your specific case is that your attribute information
> contains a mix of what is conceptually speaking character data [the
> "butter" and "fly"] and tagging [the "-" and "23"].
> The basic problem is a conflict between two competing views of textual
> data--as linguistic object or graphic object.
For our language database, the words are primarily seen as linguistic objects.
To every word form in each of our texts we want to assign part of speech
(POS) and headword. Due to the large amount of texts we intend to
incorporate in our language database, it can only be done by using
lemmatizer/POS-taggers. The solution I have suggested for the
butter-23fly problem is intended to facilitate the automatic tagging
process and on the other hand give an account of the way the word
was originally printed, however, without having the intention to be
able to reproduce the originally printed form.
Of course, for an electronic edition of a text, this would not be the
right approach. But to indicate what we are up against. We are now
preparing a prototype of our Dutch language database, using
samples of at least 250 different texts and parts of three different
scientific dictionaries of Dutch. Ultimately, the Integrated Language
Database of Dutch will contain complete texts, and a lot more than
250, and the complete collection of scientific dictionaries that were
ever made of the Dutch language.
Katrien A.C. Depuydt
Instituut voor Nederlandse Lexicologie
(Institute for Dutch Lexicology)
(editor Dutch Language Database)
e-mail: [log in to unmask]
tel.: +31 71 5272479
NL-2300 RA Leiden