Print

Print


Let me take a stab at two of Tobias's questions:
 
I. Display Entity Sets
 
> * Does anybody use the very replacement text of entities to provide TeX
>   code or ASCII transliteration or whatever in their local processing?
>   Or do you replace the standard replacement text (as in the above
>   example) with post-processors?
 
I have seen (and done) both, but only the former with a clean
conscience.
 
Goldfarb's book distinguishes ISO Registered Entity Sets from Display
Entity Sets, where the latter involve replacement text that may vary
according to local system requirements. For multilingual Latin-alphabet
work that I need to render in rtf and html, I invoke different Display
Entity Sets that translate directly from the &...; notation to the
literal replacement string I want to send to my rendering process.
 
The technical reason not to post-process the standard replacement text
is that it is not guaranteed to mean the same thing at all times. That
is, if you replace á with [aacute], the latter is merged into
other possible occurrences of the string "[aacute]" in your document. In
most cases your document will not have any other occurrences of
"[aacute]" (unless you are writing about SGML), but such collisions are
nonetheless possible.
 
II. Element vs Attribute
 
> * I have three distinctly different forms of <app> in my document, some
>   of them could even need subtyping.  Would you, when you get such a
>   document, prefer it to use only <app> tags differentiated by type=
>   attribute, or would you rather see three new tags, like <appSources>,
>   <appPrintVariants> and <appSynopsis>?  (The same question of type=
>   versus <tag> arises rather frequently, I suppose.)  In principle, the
>   solutions are equivalent, of course, but practical life might suggest
>   one over the other.
 
One consideration is that the content model of elements cannot depend on
attribute values. This means that if your three types of "app" are to
have different content models, they should be different elements. If
they are to have the same content model, it may not matter which
approach you take.
 
Also on the "Entity vs Attribute" front, the fact that attributes may
not be modified by other attributes may make influence how a particular
structural feature is encoded. For example, if you have a reading in a
manuscript witness that is corrupt, you might indicate the correct
reading in SGML as:
 
 <READING CORRECT="TEI" AUTHORITY="Birnbaum">TEJ</READING>
 
But what if multiple authorities are going to submit multiple
corrections in an open collaborative edition? What you really want above
is to associate the value of CORRECT with the value of AUTHORITY and
only then have the set of these two attribute values associated with the
content.  But the syntax has both attribute values pointing to the
content independently, with no particular association between them. The
consequences of this syntax become obvious only when Rischer (for
example) decides to challenge Birnbaum's interpretation, and finds that
there is no way to incorporate a second correction by a second
authority.
 
One might deal with this problem by treating each authority as an
attribute value associated with a specific correction text, as in:
 
 <READING>
   <CORRECT AUTHORITY="Birnbaum">TEI</CORRECT>
   <CORRECT AUTHORITY="Rischer">TEH</CORRECT>
   <CITATION>TEJ</CITATION>
        </READING>
 
(or omit the CITATION tags and treat TEJ as PCDATA content of the element
READING).
 
Cheers,
 
David
________________________________________________________________________
 
Professor David J. Birnbaum      [log in to unmask]
The Royal York Apartments, #802  http://clover.slavic.pitt.edu/~djbpitt/
3955 Bigelow Boulevard           voice: 1-412-624-5712
Pittsburgh, PA  15213  USA       fax:   1-412-624-9714