| The work of Dan Connolly on "A Lexical Analyzer for HTML and
| Basic SGML" may be of interest to those working in this area.
| His proposal is at <url:http://www.w3.org/pub/WWW/TR/WD-sgml-lex>.
This is indeed interesting, but in the context of defining a simpler
ML than SGML.
| This is an attempt to document in a formal way the subset of
| SGML used in HTML. He claims HTML, along with TEI, DocBook and
| a couple others are "basic SGML documents" as defined in the
| standard with few exceptions. An implementation of a lexical
| analyzer compatible with this subset is offered and may
| be of interest to those following this thread.
Docbook changes the reference capacity set and sets FORMAL to YES,
and is therefore not "basic" per 15.1.1.
ISO 8879 does indeed defined "basic SGML documents" (15.1.1), but
Dan has restricted his "basic SGML" beyond that definition:
The objectives of the document are to:
1. refine the notion of "basic SGML document" to the precise set of
features used in HTML 2.0.
The string <! followed by a name begins a markup declaration.
The name is followed by parameters and a >. A [ in the parameters
opens a declaration subset, which is a construct prohibited by this report.
Note that the output contains all newlines (record end characters) from
the input verbatim. Implementing the rules for ignoring record end characters
as per section 7.6.1 of SGML is left to the client.
The reference concrete syntax includes certain limitations (capacities
and quantities, in the language of the standard). For most purposes,
these limitations are unnecessary. We remove them.
We require the SGML declaration to be implicit and the DTD to be
included by reference only.
Parameter Entity Reference
The %name; construct is a parameter entity reference -- similar to a
reference to a C macro. There is little use for these given the above
limitations. An occurrence of a parameter entity in a markup declaration
I could go on, but the above suffices to make it clear that Dan's lexer
is useless for SGML in general. It *is* a crack at "minimal SGML" or
"monastic SGML" or whatever you want to call it, and the deployment
experience of HTML suggests that there is a market niche for some such.
Terry Allen ([log in to unmask]), O'Reilly & Associates, Inc.
A Davenport Group sponsor. See http://www.ora.com/davenport/README.html
Fish of the Day: Carpe Diem