Print

Print


At 09:06 PM 2/26/2004, Syd wrote:
>The point is that in these cases it is the specification that is in
>error, and (in my current humble opinion) one should deliberately
>avoid leaving an escape hatch such as "in cases of conflict the
>Schema is normative" which would permit users to continue to use a
>specification that is broken rather than report the error and insist
>it be corrected.

I can only comment on this from the sidelines, having no more standing to
declare what is normative than any other TEI member (hey I hope I
remembered to pay my dues).

I thank Syd for his clear statement addressing this issue. It is bound to
come up from time to time, it's a central issue, and it's worth contemplating.

I also agree with the above, and feel moreover that is the only way to go.
As I paraphrase it: "if the formal and informal specifications disagree,
there's a problem that needs to be addressed, not a formula to be followed
as to which is authoritative".

(I note in passing that the actual case, the order of elements in
<publicationStmt>, arguably does *not* fall into this category, since the
stipulation of an any-order content model does not actually say an order is
*not* required or that any order is allowed; it says merely that a
processing application that intends to handle all cases valid to that
particular model might be prudent to accept any order of elements, even if
only one order is "valid" in the abstract. It might even helpfully re-order
things and be "improving" the tagging thereby -- albeit running risks with
really funky data, but reducing the need for extra-validation checks for a
more strictly conforming TEI process downstream. As always, "DTD valid"
cannot mean "correct".)

As to the question itself: the idea that a formal spec such as a DTD should
not, and cannot, be considered to be perfectly sufficient for the job of
specifying a tag set and its usage, is long-standing, being articulated as
early as the SGML standard[1]. Along with this comes the corollary that
informal measures such as prose documentation can be very helpful to bridge
the gap. In fact, formally a "Document Type Definition" (as opposed to the
"Document Type Declaration", i.e. the set of declarations of elements,
attributes etc.) in SGML is not even exclusively identified with the formal
portion, but includes constraints expressed only in prose as well. In
October 2002 I claimed at the TEI Members' Meeting[2] that TEI wisely
shares in this heritage, and no one disagreed with me then. So we can be
reassured on this point as far as it's been addressed by the "tradition", I
think. :->

And from a technical point of view, the notion that validation itself is an
application that has "soft edges" and that is often profitably implemented
in layers -- the outermost of which might not be automated or even
perfectly formalized -- has also, I think, been achieving some traction.
(Of course I admit I've also been an advocate of such architectures, both
in practice and in theory ... <plug
href="http://www.piez.org/wendell/papers/signsystems.pdf">see my 2002 paper
"Human and Machine Sign Systems", especially the latter parts, for more on
this</plug>. (I even approve of TEI as an example. ;-)

And in fact, reflecting that the TEI declarations themselves are subject to
modification, while remaining "TEI compliant" both in spirit and even in
letter (but mind you dot your i's), is rather unsettling if you then take
those same declarations to be the final authority about what tags should
"mean" in actual usage (even in the limited sense of their roles in the
"tag grammar", as stipulated by content models and attribute declarations).
If you start by undoing one or more of TEI's own formal declarations, but
have already relegated the Guidelines' prose (and along with it your own)
to subordinate glosses or documentation, where do we turn for our final
authority on "TEI" then -- your modified declarations? We kind of have a
bootstrapping problem don't we? Which is why the ultimate test of
conformance is how you *document* things.

Of course, as that Extreme paper (using its own terminology) makes clear, I
hope, there's a difference between prescriptive and descriptive definitions
even when it comes to markup languages and their "emergent semantics" in
use (as I called it, thinking first and foremost about HTML), and so we
have the ancient conundrum of what constitutes authority anyway. But we
have to start somewhere.

Fortunately, we are literate creatures, and can use prose.

Cheers,
Wendell

[1] "Part of a document type definition can be specified by an SGML
document type declaration. Other parts, such as the semantics of element
and attributes, or any application conventions, cannot be expressed
formally in SGML. Comments can be used, however, to express them
informally" [ISO 8859 4.105]. It's kind of quaint, perhaps, to expect these
expressions to be only in comments; but prose, I think, is the point. -wap

[2] The slides for this presentatation are at
http://www.piez.org/wendell/papers/TEIbeyondTags.html, though (I apologize)
you'll need an SVG Viewer to see the graphics (this I must fix). -wap


___&&__&_&___&_&__&&&__&_&__&__&&____&&_&___&__&_&&_____&__&__&&_____&_&&_
     "Thus I make my own use of the telegraph, without consulting
      the directors, like the sparrows, which I perceive use it
      extensively for a perch." -- Thoreau