On Thu, 7 Jan 1993 17:38:26 CST Dr. habil. Reinhard Wonneberger
<[log in to unmask]> said:
> ...
> while browsing through
> TEI P2 Chapter 6: Elements available in all TEI DTDs
> I became aware that attributes are used at several places
> to code text data, esp. so in connection with corrections.
>
> Unfortunately the examples I have seen only contain core letters,
> not special letters coded as entities.
>
> This raises the following questions:
> - is it correct to code entities in attributes
> = according to SGML definition
> = according to ordinary parser implementation practice.
Yes to both questions. The text of ISO 8879 is made unfortunately
opaque in this area by a crucial but very quiet distinction between
'attribute value' and 'attribute value literal'; the latter is any
attribute value specified between quotation marks, and is "interpreted
as an attribute value by replacing references within it, ignoring
[entity end] and [record start], and replacing [a record end] or [tab]
with a SPACE." (ISO 8879, clause 7.9.3, text after production 34.) I
have yet to encounter any parser which produced output for further
processing, which did not expand entity references within attribute value
literals.
> - what is the concept of attributes?
> = specification of a known set of cases in the meta-language
> = free usage also for non-foreseable things like ordinary text?
I'm not entirely certain I understand the question. Attributes can
certainly be used to classify an element according to some fixed
typology (so that one could imagine a tag set in which all LIST elements
were one of: numbered, lettered, bulleted, unmarked); of course,
elements can also be used to convey the same information. In most
cases, the TEI tag set uses attributes where the possible values for a
feature seem to come from a restricted range of values; however, because
so many apparently restricted ranges can be extended in special
circumstances (as, for example, when one encodes a text and discovers it
has an unforeseen type of list), the TEI almost never actually restricts
the value of such attributes formally; the TYPE attribute on our LIST
element, for example, is declared as CDATA, not as (numbered, lettered,
bulleted, unmarked).
Similarly, attributes can be used to encode simple unstructured text,
and are so used in the editorial-intervention tags you mention. This
usage is not very frequent in the TEI, however, because unstructured
text is so apt to turn out, in some cases, to contain some structure
after all, e.g. in the form of phrase-level elements for emphasis or
technical terms; such text can be transcribed in an attribute value, of
course, but no conforming SGML parser can recognize the tags for the
embedded phrase-level elements. This is why the tags for 'simple
editorial interventions' are labeled as good only for 'simple' cases:
cases in which both of the readings contain nested elements are better
handled with the stronger mechanisms of the chapter on text criticism.
> Although I must admit that it is often very convenient to use
> attributes for text, I think we need a clear decision in this
> area.
The decision as it now stands is, I think, quite definite that attributes
can be used to encode running text, but that their limitations in such
uses make it advisable to provide stronger mechanisms instead or in
addition. The fact that SIC and CORR are so much simpler than the
equivalent tagging using the full-bore text-critical tag set seems to
make it ad
> I also feel that the examples and comments should state very
> clearly which concept applies; if text should be allowed, then
> examples should always contain some Umlaut or so to make that
> clear.
You are right; there were some examples with entity references in the
attribute values, but they have been moved over into the chapter on
text criticism (or that on manuscripts). You wouldn't happen to
be able to provide any real examples which meet the case, would you?
Thanks.
-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
|