Print

Print


I am working for the project "A historical corpus of the Welsh language",
funded by the Arts and Humanities Research Board. The project
is based in the Department of Linguistics at the University of Cambridge.
We plan to set up an electronic collection of Modern and Early Modern Welsh
texts. We are using TEI for the corpus.

Since we wanted to find out what kinds of problems might arise in using
TEI, we started with a text that would address several areas of TEI: a
17th c Welsh versified drama from a manuscript (unedited).

Although by now I do have some idea how to use TEI, there are still some
problems, and I am not always sure that the solutions I have found are
good, clever, or valid. I have consulted the TEI-Guidelines, but am still
in doubt in some cases.

Likewise, where simpler solutions than suggested seem to be possible, I
would, of course, rather go for these.

I would be very grateful for any comments or advice - or confirmation - on
the following points. Some are rather basic, some perhaps less so.

As the list of problems has become rather long, I am going to post
separate e-mails, usually one for each problem. The first one is
contained in this file though. The problems will concern:

- sequence of tags (CORR + ORIG); data processing [IN THIS MAIL !]
- tags allowed in the value of an attribute?
- <prologue> and <epilogue> allowed within <body>?; alternatives
- correction extending across line end in verse
- missing verse line leading to ambiguous stanza type(s)
- comment on "parts" in TEI-Guidelines
- verse line in stanza split up between speakers
- "speakers" and stage directions mixed
- discontinuous stage directions
- binding reference systems (lines etc.) for electronic first editions

[Examples are made up, but reflect the respective problems; tagging has
often been simplified in order to highlight the point in question.
Occasional text content has been supplied for illustration only
and does not reflect the quality of the original text.]

--------------------------------------
SEQUENCE OF TAGS?

Let's suppose your text has a reading which is both erroneous and whose
spelling is dated. You want to correct it (using the CORR-tag) and give a
regularization (using the ORIG-tag). You'd rather not use an apparatus.

The text has, say, _fhyld_. The scribe or a printer has obviously mixed up
the letters F andlong S, the correct reading should be _shyld_, and the modern
spelling of the word is _shield_.

In terms of editing, correction comes first, regularization second. Now,
translating this into TEI, would that mean that CORR is the inner tag,
ORIG the outer tag?; ie:

<orig reg="shield"><corr sic="fhyld">shyld</corr></orig>

This question is also to do with how the data in the TEI document and the
order in which it is given is going to be processed later on, about which
I'd like to know more.

As it is, I also feel slightly uneasy about marking my correction as the
original reading, which it, properly speaking, isn't.


Thanks a lot for your help!


Ingo Mittendorf
Department of Linguistics
University of Cambridge
Sidgwick Avenue
Cambridge CB3 9DA
UK