If you want to roll your own, a beta release of our Normalised SGML
Library API, an SGML-application construction toolkit for Un*x
environments, is available for research and evaluation purposes.
There's are links from my home page: http://www.cogsci.ed.ac.uk/~ht/.
Here are the introductory paragraphs from the documentation:
In pursuit of a development environment for SGML-based corpus and
document processing, with support for multiple versions and multiple
levels of annotation, LTG have developed an integrated set of SGML
tools and a developers tool-kit, including a C-based API.
This software described here contains everything required to process a
very wide range of conformant SGML documents. Its initial parsing
module incorporates v1.0.1 of James Clark's SP software, arguably the
broadest coverage SGML parser available anywhere, commercial or not.
The basic architecture is one in which an arbitrary SGML document is
processed on the way in, as it were, yielding two results: 1) An
optimised representation of the information contained in the
document's DOCTYPE; 2) A normalised version of the document instance,
which can be piped through any tools built using our API for
augmentation, extraction, etc. The use of the cached DOCTYPE together
with the normalisation of the SGML to nSGML means that
applications processing nSGML streams can be very efficient.