Print

Print


Peter Boot's solution using word tagging won't work within the TEI
guidelines--which only allow character and morpheme tags inside <w>
(i.e. the TEI sees words as linguistic objects only).  You can of
course  adjust the content of w to allow other material (I did this
originally in my coding, though I later backed off; I now mark graphic
words as seg's in my diplomatic transcriptions and reserve w for
linguistic objects in my critical and reading texts).  I used to think
the decision to restrict the content of the w element was a foolish
mistake; I'm now gradually coming to the belief that it is a good thing,
because it forces coders to decide what they are interested in.

Given the goals and corpus size of Katrien's project, I'd wonder why she
is interested in the graphic representation of the word butter-fly at
all?  The corpus is not an edition, as I understand it, but a lexical
database.  Including accidentals like hyphenation is only going to
massively increase the complexity of the coding work, invite error (you
will either have to devote resources to proof-reading hyphenation and
other graphic clues, or not trust your representation of these elements
in the corpus), and create problems with subsequent automation.  While
it might be nice to "edit" texts as well as include them in the corpus,
I'd say worrying about accidental details on a lexicographic project is
inviting problems.

An example of a corpus-based, TEI-encoded, project that does a really
good job of handling its corpus is the Dictionary of Old English.  Their
Corpus of Old English contains almost all surviving Old English but is
very minimally tagged: Texts and sentences are marked and ID'd;
corrections and emendations necessary for sense or taken over from other
print editions are explicitly tagged; but all other non-lexical details
are ignored.  What they lose in detail (and Anglo-Saxonists love to
fight over the most accidental of details) they gain in robustness, ease
of use, and clarity.  How does the Helsinki corpus tag graphic
accidentals?

-dan
--
Daniel Paul O'Donnell
Department of English
University of Lethbridge
Lethbridge AB T1K 3M4
Canada

Tel (403) 329-2377
Fax (403) 382-7191
[log in to unmask]
http://home.uleth.ca/~daniel.odonnell