Print

Print


I'll take advantage of the fact that Michael is still eating his breakfast to
comment directly on Terry Allen's interesting questions from last week:
 
>I am attempting to construct TEI headers
>describing computer software documentation that is marked up in
>SGML according to the Docbook DTD.
 
I'll probably deny it later, but I must admit that it hadn't occurred to me
before that people might want to use the TEI header to provide meta-information
for a non-TEI-conformant document. But the more I think about it, the more it
seems like a pretty neat idea. Though not as neat as redefining the docbook dtd
in terms of TEI of course (but probably achievable)
 
[flattering remarks deleted here]
 
>1) A pair of nits.  Chapter 5 says at one point
>>   The <title> element contains the chief name of the file,
>but files do not have to equal entities, and what should be said here
>is not "file" but something like "SGML entity described by the
>header."  Cf. the first line of the chapter:
>>   This chapter addresses the problems of describing an encoded work
 
The term "file" (as in "computer file") is a hangover from the AACR2 origins of
the header recommendations. The term "chief name" is also, I think,
librarianese. We intended to weed out or flag all such cases, but clearly we
missed some.  Another term we considered using was "text" -- except that some
texts are composite. I rather like the term "resource" myself, but it isn't in
the Guidelines yet. Basically, the <title> element  contains the official name
of the thing-that-the-header-is-attached-to, OK?
 
>
>And in an example there occurs:
>><seriesTitle>Machine-Readable Texts for the Study of
>where <title> seems to be meant.
 
Yes, that is an erroneous example. I'm glad to say that it's on my list of
erroneous examples. Less glad that said list is still sitting on my laptop
waiting for me to do something about it.
 
 
>2) I find no examples in Chapter 5 in which <author> contains the
>subelements <forename> and <surname>, and don't see how it is to
>be done.  But I must be able to manage it.  How?
 
<author> is defined as containing %phrase.seq, so you can embed any phrase
level elements within it. This enables you to use <name type=surname>, <name
type=forename> <name type=spurious> etc etc. ad lib. If you want a more
structured approach, you need to switch on the additional tagset for names and
dates, whereupon you will be able to use the <persName> element which does
indeed have all sorts of substructure, inclusing <foreName>, <surName> etc.
See further chapter 20
 
>Why should <sourcedesc> be required in the case (not apparently
>covered in section 5.2) in which the electronic text (not file,
>again, as occurs in 5.2.8, but text) was written as it is being
>presented.  I suppose it may be better for the usual purposes
>of TEI to maintain the requirement, and for the case at hand to
>include a <sourcedesc>as found</> or something of the
>sort, but no rationale is stated.
 
Not quite sure what you mean by "as it is being presented", but if you mean
simply that the thing  was created in machine-readable form and has no
pre-existing source, see the second example on page 105.
 
>The use of IDREF in  the scheme att of <keywords> appears to
>entail the construction of a whole <encodingdesc> to hold to target ID,
>when this may be a well known scheme.  I am inclined to alter this
>definition, making IDREF into NAME, or whatever would best describe an
>FPI or URN naming or pointing to that scheme (see example).
 
I understand the motivation for this, and it was certainly an option we
considered. On balance, we felt that if people were going to use an attribute
value to *identify* something, we should insist on there being somewhere a
definition of what that might be. The source attribute is not obligatory, after
all.
 
>Chapter 5 says:
>   The <teiHeader> element should be clearly distinguished both from the
>SGML prolog, ...
>and from the front matter of the text itself
>but doesn't say how.  In the case of a text written as an etext from
>the start, there will inevitably be some overlap; does anyone have
>suggestions about what should be given only in the front matter?
 
Yes. That's an interesting question. I'm not entirely sure what would be
the front matter of a resource created in electronic form, but I'd guess that
it would include things like tables-of-contents, fancy title pages,
introductory essays etc, just like those in a non-electronic work. The header,
because of its structured nature, its exclusively meta-data content (and
probable invisibility) is really quite different.
 
Comments from others would be welcome on this point.
 
from teiclas2.ent:
>The entities phrase and phrase.seq are the same in all
>bases. They may include elements specific to single tag
>sets; if the tag set is not selected, these elements
>are undefined and have no effect.
>
>I beg to differ.  These clutter up the error stream (of sgmls)
>considerably.  One can filter out these warnings (don't use the
>-u flag), but risks filtering out similar warnings of interest
>if one extends the DTD.  The warnings of duplicate parameter
>entity specifications (which you get with the -d flag)
>are less worrisome and perhaps unavoidable in an economical DTD
>design.  Perhaps the "undefined in DTD" warnings for entities
>also cannot be avoided in such a design, but either the DTD or
>SGML is showing some inelegance here.
 
There has been a thread on the topic of "stupid error messages resulting from
the way the TEI parameter entities are defined" ever since publication of the
P3 dtd. The TEI position has been, and remains, that what we are doing is
explicitly permitted by the SGML standard. Every single software vendor with a
product which behaves suboptimally when processing the TEI dtd has agreed
(eventually) that the problem is not in the dtd. I will have to get Michael to
dig out chapter and verse if you remain unconvinced of this: however, we like
to live in the real world too, and that is why we are currently working on ways
of producing simpler, predigested (but non-modifiable) versions of the dtd that
any idiot parser can handle.
 
 
>7)  an example
><!-- encodingdesc not needed, nor profiledesc, while revisiondesc
>        would duplicate info in Docbook Revhistory.  We might
>        find it useful later.  -->
 
The profileDesc might be useful for holding indexing information, surely?
(as I see below you have)
 
The encodingDesc would be the obvious place to  say that the text
is an SGML document using DocBook dtd version whatever, except that there isn't
an element for this specific purpose. I'd also commend to
your attention the <tagsDecl> element, which allows you to specify how many
times each element in the document appears (see p 115) and what iots default
rendition should be.
 
Your example <taxonomy> looks a bit weird to my eyes. The purpose of the
<taxonomy> element is to contain EITHER a reference to a pre-existing
classification scheme (tagged as a <bibl> OR an exhaustive definition of such a
scheme (tagged as a structure of nested <category>s. You have used the tagging
for the latter with the content of the former. What I think you want is
something more like the example on the top of p 122:
 
<encodingdesc>
<classdecl>
<taxonomy id=LCSH>
<bibl>Library of Congress Subject Headings
</bibl>
</taxonomy>
</classdecl>
</encodingdesc>
 
 
OK, that's all my responding-to-messages-on-TEI-time spent for the week. Thanks
for the interesting questions!
 
lou