Greetings,
I am involved in a discussion about encoding a manuscript of
considerable length at the graphic character level. Essentially the base
text will consist of a container element that contains only <c></c>
elements with no spaces, line breaks, etc. The plan is to use XLinks for
stand-off markup to impose structures such as word division, clause
analysis, line breaks, chapter divisions, etc., on the text.
The proofing of the text will be a fairly involved process (we are
planning on an Open Text process, a variant of the Open Source movement)
but we are assuming for planning purposes that errors will be found in
the text even after all the proofing has been done. I am concerned that
if sets of XLinks have been created for the text as first released that
they not be broken by the insertion/deletion of material at the
character level. For example,
<text>
<c type="consonant" id="00001">h</c>
<c type="vowel" id="00002">a</c>
<c type="consonant" id="00003">t</c>
</text>
Assuming that the word should be "what" and this error is detected after
the text has been released, I need to construct an ID for "w" that will
not break the XLinks that surround this to create a word division.
<text>
<c type="consonant" id="?????""w</c>
<c type="consonant" id="00001">h</c>
<c type="vowel" id="00002">a</c>
<c type="consonant" id="00003">t</c>
</text>
The same case could happen at the end of word divisions as well. (I have
deliberately omitted inside word boundary mistakes since we intend to
use and recommend the range method of selecting elements for XLinks and
I can construct ID's that will fall within the necessary range.) Similar
errors may occur at the morpheme level but I assume the word boundary
solution could be generalized for that case as well.
The reason for the minimal level markup is to allow textual variations
to be created/attached to the base text at the lowest level possible.
There are a number of traditional as well as modern hierarchies may be
imposed upon the text and composition of a version with minimal markup
will hopefully force consideration of the hierarchies that are being
imposed.
Has anyone in the TEI community had experience in inserting/deleting
elements while avoiding breaking links to the base text? (My first blush
impression is that we will need to simply version the text and indicate
that document instances constructed from this text will work with
particular version but I would like to find a more elegant solution.)
Our intent is to use the full TEI DTD when creating our representation
of divisions, critical apparatus, linguistic analysis of the text but
will also release the base text with the minimal markup indicated to
allow others to make their own determinations concerning features of the
text.
Sorry to be so vague about the actual text but we are anticipating a
formal announcement at the AAR/SBL Annual Meeting in Boston, November
20-23, 1999. Between now and then we are ironing out some of the
technical details and trying to generate internal institutional support
for our efforts. As I noted above, we intend to proceed on an Open Text
basis for this project and more complete information will be posted to
this and other discussion lists either the week after the AAR/SBL
meeting or the first week of December, 1999.
Many thanks!
Patrick
--
Patrick Durusau
Information Technology Services
Scholars Press
[log in to unmask]
Manager, ITS
|