For a trial of an elementary URC (Uniform Resource Characteristic)
resolution service, I am attempting to construct TEI headers
describing computer software documentation that is marked up in
SGML (according to the Docbook DTD).
First of all, I have high regard for the TEI effort, and found
in the course of working up an example much evidence of
careful consideration and fine judgement in the header DTD.
While I have filched some stuff from TEI in the past, and have at
least cast an eye over all of P2, I confess to not having reading all
the P3 documentation, so if necessary please tell me where to rtfm.
I found Chapter 5, along with Chapter 3, sufficient guide in
constructing the example below (or not, you be the judge!). For the
purpose at hand, I took the remark in Chapter 24,
> The structure of an independent header is exactly the same as that of
a <teiHeader> attached to a document, and can therefore be validated
using the same document type definition (DTD). In practice, this means
that a <teiHeader> and its DTD can be extracted from a TEI document and
shipped to a receiving institution with little or no change. However,
some fields that are listed as "optional" in the header are listed as
"recommended" for the independent header.
as sufficient for the time being, and deferred reading the whole thing
until I get myself deeper in the soup. Here are some comments
A pair of nits. Chapter 5 says at one point
> The <title> element contains the chief name of the file,
but files do not have to equal entities, and what should be said here
is not "file" but something like "SGML entity described by the
header." Cf. the first line of the chapter:
> This chapter addresses the problems of describing an encoded work
And in an example there occurs:
><seriesTitle>Machine-Readable Texts for the Study of
where <title> seems to be meant.
I find no examples in Chapter 5 in which <author> contains the
subelements <forename> and <surname>, and don't see how it is to
be done. But I must be able to manage it. How?
Why should <sourcedesc> be required in the case (not apparently
covered in section 5.2) in which the electronic text (not file,
again, as occurs in 5.2.8, but text) was written as it is being
presented. I suppose it may be better for the usual purposes
of TEI to maintain the requirement, and for the case at hand to
include a <sourcedesc>as found</> or something of the
sort, but no rationale is stated.
The use of IDREF in the scheme att of <keywords> appears to
entail the construction of a whole <encodingdesc> to hold to target ID,
when this may be a well known scheme. I am inclined to alter this
definition, making IDREF into NAME, or whatever would best describe an
FPI or URN naming or pointing to that scheme (see example).
Chapter 5 says:
> The <teiHeader> element should be clearly distinguished both from the
SGML prolog, ...
and from the front matter of the text itself
but doesn't say how. In the case of a text written as an etext from
the start, there will inevitably be some overlap; does anyone have
suggestions about what should be given only in the front matter?
>The entities phrase and phrase.seq are the same in all
bases. They may include elements specific to single tag
sets; if the tag set is not selected, these elements
are undefined and have no effect.
I beg to differ. These clutter up the error stream (of sgmls)
considerably. One can filter out these warnings (don't use the
-u flag), but risks filtering out similar warnings of interest
if one extends the DTD. The warnings of duplicate parameter
entity specifications (which you get with the -d flag)
are less worrisome and perhaps unavoidable in an economical DTD
design. Perhaps the "undefined in DTD" warnings for entities
also cannot be avoided in such a design, but either the DTD or
SGML is showing some inelegance here.
7) an example
<!doctype teiheader system "tei2.dtd"[
<!ENTITY % TEI.general 'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!-- as for most elements, the atts of teiheader are not really
needed for an elementary URC -->
<!-- encodingdesc not needed, nor profiledesc, while revisiondesc
would duplicate info in Docbook Revhistory. We might
find it useful later. -->
<title>X Window System User's Guide: electronic edition</>
<!-- TEI recommends that you distinguish the titles of print works
and electronic versions in this fashion, using one of two
set phrases, the other one being "a machine readable
<edition>OSF/Motif 1.2 Edition</>
<publisher>O'Reilly & Associates, Inc.</>
<!-- ISBN of the electronic edition, not of the print book -->
<date>1 April 1994</>
<title>X Window System</>
<!-- sourcedesc provided here only for conformance; not necessarily
relevant for URCs for Davenport. What if the present
document was written as an etext? -->
<catdesc>Library of Congress Subject Headings
<!-- this is as bad as Hytime for verbosity -->
Terry Allen ([log in to unmask]) O'Reilly & Associates, Inc.
Editor, Digital Media Group 103A Morris St.
Sebastopol, Calif., 95472
A Davenport Group sponsor. For information on the Davenport
Group see ftp://ftp.ora.com/pub/davenport/README.html