Espen Ore wrote:
> The transcription of primary sources is an area where I think TEI will
> have to be expanded as part of the updating work now beginning. During
> our discussions today we came as far as to suggest a new model that
> allows what we want. In the example expanded as "kirkia" one could do
> something like:
> I would very much appreciate some comments on this, especially if we
> have overlooked something fundamental.
Espen, I think this kind of approach would work reasonably well for Old
Norse manuscripts, where the abbreviations tend to be, I think, of the
kind that you've discussed in your message, that is letters or groups of
letters to which one of a smallish set of abbreviation signs are added
to indicate that further letters should be added to make the full word
I have some worries, however, about how easily a system of this
kind could be extended even to other medieval manuscript
domains. For example, in Latin manuscripts there are often signs
(not letters, but further individual graphs) that simply indicate
that something has been omitted (for example an inflexional
ending) which the reader is supposed to supply. So rather than
consisting of one of the letters of the word that is being
abbreviated, the "abbreviation" replaces part or all of the word
with a graph that does not consist of the alteration of a letter.
More generally, I'm wondering whether it's a good idea to combine
in a single file what are really two quite distinct treatments of
the manuscript: a "transcription" (this is what I SEE when I look
at the MS) and a "reading" (these are the words I UNDERSTAND the MS
to be indicating with its signs). (I realize that I may be
misinterpreting your message here, since you may not _mean_ to
imply with your example transcription of kirkja that each
abbreviated word would be transcribed twice in the same file,
once to record the signs and a second time to interpret them.)
I would prefer one file of pure "transcription," using, probably,
entities to represent each distinct grapheme that was not
unproblematically representable with a letter (here
determinations would have to be made about whether, for example,
it was a _crossed k_ or a k plus a _cross_ that was the
grapheme), and one file of pure "reading," where tags would be used
to indicate where a sequence of letters was an expansion. Then
segmentation and alignment methods could be used to hook the two
files together, and both could be hooked to the graphics file.
Among my reservations about the single-file method, I would say
that the difficulty it imposes for reuse of the "transcription"
data (using my vocabulary from above) and the implicit
canonization of one of (in some cases) many possible "readings" of
that data, are uppermost. I agree in advance that processing
software could diminish the effect of the first of these and
hence the importance of the second.
Hope this helps.
University of Calgary.