This discussion gives me an excuse to put in a plug for SP 0.3, which
I've just released. SP can now provide all the information in the DSSSL
model. This is currently available only at the C++ level; nsgmls still
produces the same output as sgmls.
I've also written a normalizer that takes advantage of this extra
information. However, even with a parser that provides complete
information, I still find it very non-obvious what exactly a normalizer
should do. Part of the problem is that I don't have a clear
understanding of what people really want a normalizer. I'm including
the man page for my normalizer below; I would be interested to get
suggestions from potential users of ways to enhance it to meet their
You can get SP from ftp://ftp.jclark.com/pub/sp. Both source code and
binaries for various systems are availble. More information about SP is
James Clark [log in to unmask]
spam - an SGML normalizer
An SGML System Conforming to
International Standard ISO 8879 --
Standard Generalized Markup Language
spam [ -ehipwx ] [ -ccatalog_file ] [ -mmarkup_option ] [
-oentity_name ] sysid...
Spam (SP Add Markup) is a normalizer implemented using the
SP parser. Spam parses the SGML document contained in
sysid... and copies to the standard output the portion of
the document entity containing the document instance,
adding or changing markup as specified by the -m option.
The -p option can be used to include the SGML declaration
and prolog in the output. The -o option can be used to
output other entities. The -x option can be used to
expand entity references.
For more information about the underlying SGML parser and
entity manager, see nsgmls(1).
The following options are available:
-cfile Use the catalog entry file file.
-e Describe open entities in error messages.
-h Hoist omitted tags out from the start of internal
entities. If the text at the beginning of an
interal entity causes a tag to be implied, the tag
will usually be treated as being in that internal
entity; this option will instead cause it to be
treated as being in the entity that referenced the
internal entity. This option makes a difference in
conjunction with -momittag or -x -x.
-iname Pretend that
<!ENTITY % name "INCLUDE">
occurs at the start of the document type declara-
tion subset in the document entity.
Change the markup in the output according to the
value of markup_option as follows:
Add tags that were omitted using omitted tag
minimization. End tags that were omitted
because the element has a declared content
of EMPTY or an explicit content reference
will not be added.
Replace short references by named entity
net Change null end-tags into unminimized end-
tags, and change net-enabling start-tags
into unminimized start-tags.
Change empty tags into unminimized tags.
Change unclosed tags into unminimized tags.
Add omitted attribute names and vis.
Add literal delimiters omitted from
Add omitted attribute specifications.
Equivalent to combination of net, emptytag,
unclosed, attname, attvalue and attspec
rank Add omitted rank suffixes.
Multiple -m options are allowed.
-oname Output the entity name instead of the document
entity. The output will correspond to the first
time that the entity is referenced in content.
-p Output the SGML declaration and prolog before any-
thing else. This option will cause the document
entity to be read a second time; it will therefore
not work with pipes.
-w Give warnings.
-x Expand references to entities that are changed. If
this option is specified more than once, then all
references to entities that contain tags will be
Omitted tags are added at the point where they are implied
by the SGML parser (except as modified by the -h option);
this is often not quite where they are wanted.
Start-tags and end-tags that contain a name group in their
document type specification are not output.
Entity references that contain a name group are not out-
The case of general delimiters is not preserved.
James Clark ([log in to unmask]).