Print

Print


(Gesture of ceremoniously removing hat labeled 'editor' and replacing it
with one labeled 'user')
 
Mike Kreyche's inquiry is worth some reply from *all* of us who have
seen any of the software available for SGML.  What follows are purely
personal notes, which represent my views and not those of the TEI, its
sponsors, other participants, or funders, and which only represent *my*
views for today -- no guarantees about tomorrow.
 
I've had access to only two MS-DOS programs for SGML, and only know one
at all well.  That one is Markit, from SEMA Group.  The other is the
XGML Translator (XTRAN) from Software Exoterica, which I will report on
later, after I use it more.
 
Markit is an SGML parser, which comes with (a) a simple editor with some
SGML awareness, (b) some software to implement simple applications using
the SGML linked-process specification, and (c) yacc and lex source for
building your own applications to work on Markit output.
 
The parser does what its name says:  it parses an SGML document and
validates it.  If the document is legal, the parser says in effect 'yes
this is legal SGML'.  If it's not, it produces error messages saying
what went wrong.  As a side effect of the parsing, it also produces a
'canonical output' (SEMA's term) form of the document:  a 'minimal SGML
document' (ISO's term) with no omitted tags, short references, implied
tags, or other complications.  (Minimal SGML documents use *no* markup
minimization -- the markup can grow to astonishing volumes.)
 
Used from the DOS command line, Markit accepts a wide variety of option
flags, some of which still bewilder me.  I don't use it that way much,
but Lou Burnard swears by it.
 
Me, I mostly use the Markit editor (MktEd) and run the parser from
inside it.  The parsing options are specified from menus here, and it's
slightly easier to get them right.  Moreover, the display of errors
found during parsing is nice:  they can be viewed and for any error you
can go, with one keystroke, to the point at which it was detected.  I
find it a reasonably nice way to validate SGML documents.
 
Markit provides a number of switches to turn off conformance checking
for some aspects of SGML -- e.g. the quantity restrictions written into
the standard.  For full conformance, you need to turn them on, but
if like many people you think the standard has no business specifying
things like buffer sizes (even if it disguises them by saying they are
not buffer sizes), you may feel a surge of relief at being able to
tell the parser to ignore them.  I'm told this speeds the parser up,
too, since the quantity checking requires storing a lot of otherwise
irrelevant information.
 
For serious work, though, you want not just to know that your document
is correct, but to process it.  That is where the application language
and the yacc/lex source come in.  I've only scratched the surface here,
but both facilities look promising.  The application language allows you
to embed application code (written in a special-purpose language) within
comments in the SGML link type declaration.  Actions can be called at
the beginning or end of an element, and can act upon the tags, the
attributes and their values, and the content of the element; enough I/O
control is provided that you can spool data off into holding tanks and
retrieve it later (e.g. to re-order elements in your output) or suppress
some data entirely.  I've written only a toy application or two, and
haven't pushed the language far, but it should be useful for a lot of
simple transformations (e.g. for transforming the current form of the
TEI guidelines into TEI-conforming tags).  It's clear though that it
isn't quite what you'd call an full programming language, and it can
feel a bit confining (or maybe you get used to it?).
 
(Notably, the application language of Markit is notably smaller and
more primitive than the XGML Translator language as described in the
latter's manual.  But as noted I haven't used XTRAN enough yet to
talk about it.)
 
I have not used the yacc/lex source at all, but clearly if you want full
control over the processing, this is a way to get it, using a language
that many of us have access to.  There are public-domain yacc and lex
processors, so no further outlay is required beyond a C compiler.
 
Sum:  I like Markit fairly well, as a tool for constructing applications.
It is not yet a tool for people who want the application pre-constructed;
someone on the project has to be willing to get fingernails dirty
writing code.
 
All this, of course, just my own opinion, and not even a well-informed
one at that, since I haven't enough points of comparison with other
products.
 
-Michael Sperberg-McQueen, UIC