Print

Print


I've had a bit of a bee in my bonnet (sorry!) about this issue for a
couple of years.  My recommendation would be to do the opposite.  That
is encode morphological and linguistic information as attributes, and
graphic information about layout as tagged material.  This could be
automated.

This goes against the TEI guidelines in one respect--words are seen as
linguistic not graphic unities in TEI.  But it fits Lou's admonitions
against putting what is essentially tagging inside an attribute (another
problem with your specific case is that your attribute information
contains a mix of what is conceptually speaking character data [the
"butter" and "fly"] and tagging [the "-" and "23"].

The basic problem is a conflict between two competing views of textual
data--as linguistic object or graphic object.

The recurrence of problems like this brings me to ask a question that's
been bothering me as I tag quotations in a larger TEI document.  When I
write my own texts, I know the structure.  Calling things "foreign" or
"emphasis" etc. is perfectly non-ambiguous.  When I'm encoding and
quoting somebody else's document, I'm interpreting their graphic
conventions.  Should, strictly speaking, things like "italics" not be
encoded as "italics" rather than interpreted as "emphasis" or "foreign"?
--
Daniel Paul O'Donnell, PhD
Department of English
University of Lethbridge
Lethbridge Alberta T1K 3M4
Canada

Tel: +1 (403) 329-2377
Fax: +1 (403) 382-7191
e-mail: [log in to unmask]

Web-Page: http://home.uleth.ca/~daniel.odonnell
The Electronic Caedmon's Hymn: http://home.uleth.ca/~caedmon