Text Encoding Initiative
Submission on the Guidelines (P1) from the CURIA Project
These comments pertain both to the structure and to the content
of TEI P1. Readers should bear in mind that the majority of
participants in the CURIA project will have no (or very little)
previous experience of SGML, and that some of these comments are
made with this factor in mind. [For asterisked comments see end
1. The division of the document into four major sections would
be helpful (apart from the usual prefatorial matter):
a. an overview of SGML, specifically its capabilities
b. a technical introduction to SGML and the TEI proposals
c. the TEI guideline proposal text (P2)
d. a TEI tutorial, structured to follow the guidelines, with
Sections (a) and (b) would need to be kept short. This structure
would be of immense help to projects wishing to start using the
proposals, who need both instructional material for hands-on
participants as well as overview material for managers and
funding bodies, who may need to be persuaded to use SGML.
The content of sections 1-3 serve as a good model for the
overview at (a), but needs extensive, gentler, rewording, with
the technical details moved to (b). *
2. An annexe giving details of all known software which supports
or purports to support SGML, both in the way of tools, and as
applications. Clearly this has to be "at the time of writing" for
a paper edition, but an electronically-retrievable version could
be updated as a separate document. Pointers to publicly-available
files demonstrating some applications of the TEI proposals would
also be useful.
3. The index should be expanded to distinguish (typographically)
between topics, mentions _en passant_, and actual tag names.
4. The document should be available electronically, both in SGML
encoded form and as a file in some few other popular formats for
the benefit of those interested parties and intending
participants who are not yet equipped with SGML-sensitive
(Much of this has of course already come up, doubtless more than
1. The Poughkeepsie Principles (1.2) should be emphasized more,
and it should be made clearer whether the TEI proposals are a
subset, superset or cross-set of SGML.
2. The terminology is naturally complex, and requires more
careful explanation, perhaps using typographic separation for the
explanatory matter. The _locus classicus_ is #PCDATA, which means
precisely the opposite of what the computer-literate reader would
Having said that, the majority of section 2 is a good exposition
of the material at the technical level.
3. One view expressed was that requirements for the use of
CONCUR are likely to be extensive. This feature may therefore
need some closer study and explanation.
4. Typo p23 in machine text </(p.anth)p.anth) change the final
parenthesis for a greater-than.
5. Query on p35, _concrete reference syntax_, should read
_reference concrete syntax_ ?
6. Section 3.1 or 3.2 should make explicit mention of the many
encoding systems which overcome the deficiencies of 7-bit
networks (and document the known problems they themselves
introduce, such as the famous "Rutherford Rotation"). *
7. The note in 3.2.4 on the current non-invocation of WSDs does
indeed need some more work! *
8. Try to avoid hyphenation of tag names (example 5.3.6 <prop-
9. It was felt that crystals are likely to assume a greater
importance than the document betrays (5.3.12).
10. A similar argument applies to the use of CONCUR (5.6) [v.s]
11. In view of the recent development of SGML-sensitive browsing
software (eg WWW) which can be used to further identify tagged
items (particularly names, places and dates), the section (5.7)
on such links needs substantial expansion. *
12. Annotation, such as the recording of physical marks not part
of the text, needs more explanation.
13. The inclusion of the character set tables is wonderful. What about
including the new 256-character set for TeX devised at the Cork
conference? (as TeX seems to be fairly widely-used with SGML). I have a
draft copy of this if needed. *
In the cooperative spirit of the Initiative, some resources may
be available at the present author's site (UCC) for work in some
of the above areas. Possible topics are asterisked.
+353 21 276871 x2609
+353 21 277194 (fax)
<[log in to unmask]>