At 12:02 PM 7/19/2003, Michael Beddow wrote:
>But it has emerged off-list that what Gerard was thinking of (and thought
>Lou might be refering to, though I doubt whether the notion existed back
>then) was in effect what has come to be called "stand-off markup" (in the
>most radical form of a completely "clean" base text with all markup appled
Yet intriguingly, there is a common set of issues raised by stand-off
markup and SGML-style shortrefs, datatag and the rest.
To what extent is the document itself, as an entity, sufficient and
"self-describing", and to what extent is it fair to allow a dependency on
external specifications (perhaps unique to this document or to a class to
which it belongs) in order to parse and process it?
In this light, standoff markup is the application of a second document to
the first, that interpolates "markup" into the first by means of some
referencing structure -- file offsets or whatever. Accordingly, one of the
main challenges of standoff markup is keeping these two files in sync even
when the primary file changes.
DATATAG is a technique for establishing how regex-like patterns of
characters in a file can be taken as representations of "markup". What was
"standoff markup" is no longer expressed as a literal (a second document
that describes or "marks up" the first), rather as a set of processing
rules. (There are tradeoffs here. One gains back some of the robustness of
inline markup, at the price of embedding your "markup" back in the file,
albeit in disguise. Yet one still has dependency on the set of rules that
tells how to get from your actual document, to the parsed data model of
elements and attributes.)
Students of punctuation and its history -- punctuation as a form of markup,
after all -- may note that it is punctuation characters (along with their
silent but pesky friend, white space!) that are normally and most naturally
drafted into service for DATATAG.
XML crested a trend that said "we don't need no SGML declaration, we don't
even need a DTD!" asserting that we all benefit if we toss aside the set of
functionalities provided by DATATAG, SHORTREF, tag omissibility, and other
means SGML gave us to tweak the lexical aspects of our markup, instead
settling on a set of Universal Rules that can serve to support a commodity
tool set. This was described as "Monastic SGML" as early as 1994-1995, I
believe, although at that time the overt intent was likely not to deprecate
the DTD as such. :->
And indeed, this has proven to be a wise trade-off to make in order to get
Web-SGML. Among its salutary effects has been that the role of the DTD in
providing a set of validation criteria has been distinguished from its role
defining or enhancing the information set (providing attribute defaults,
saying where tags can be omitted etc.). This has done a great deal to
clarify what a DTD is actually good for, and where it's necessary, in an
application workflow. Likewise, it has shed light on how syntactic or
lexical functions -- provided by the SGML declaration or by the hard-wired
rules of the XML Rec -- can be helpfully distinguished from "semantic"
aspects, such as the grammar provided by a set of content models.
Nonetheless, certain other trends in schema languages (such as datatyping)
and in markup-related applications (see WikiML as a modern-day
shortref/datatag) also demonstrate that the set of functionalities provided
by being able to switch in ad-hoc specs for markup languages (viz.: a
"schema") -- loose coupling between the lexical instance and the data
object it represents?! -- may have applications just as formidable as those
provided by loose coupling between markup and applications (viz:
descriptive encoding + stylesheets).
It all comes back to our vacillation between which should have primacy, the
lexical instance (XML as character string) or the data model we make of it
(that Mystery that the XML string represents ... the Tree of Knowledge?).
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML