Sebastian Rahtz wrote:
> > But this, it seems to me, is what we have catalogs for. In straight
> > SGML, I'd just do this:
> > <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" [
> > <!ENTITY % Project PUBLIC "-//Stoa//DTD Project TEI mainline//EN">
> > %Project;
> > ]>
> > and everyone would be happy. Nsgmls is OK with this even if it's being
> > told to parse XML. But since this *isn't* proper XML syntax for public
> > identifiers, Xalan correctly complains that the file is not well formed.
> I am puzzled, because SGML now seems like a distant horrible memory, but
> why do you have the entity in that doctype subset?
Isn't this a normal way of using the parameterized TEI files? We've
been doing it here (at Perseus) for years: never occurred to me that it
> anyway, it seems to me that what you want is
> <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN"
> should be acceptable to any XML processor; you might need to use
> onsgmls instead of nsgmls, of course.
Unfortunately, I can't control the "whatever" part of that URL. I
realize this would work just fine if I could.
> Why do you want to use ngsmls, however? It is/was a great bit of
> software, but there are better XML parsers these days. I would use
> xmllint if in doubt, or rxp. Both of these are open source C programs
> which should compile anywhere, and don't have that yucky Java
> dependence. I can see where you might want Xalan in a Java servlet
> context, but otherwise I cannot see much advantage to it.
I'm not deeply in love with xalan either, but the project wants to run
under Cocoon -- precisely "a Java servlet context," and Xalan is the
parser that comes with the package. I realize I could use a different
one instead but I thought, perhaps naively, that it would be easier to
use the pieces that had been tested together.
As for nsgmls, one reason I use it is that it does what I want here
:-)~~~~~~~~ More important, the other application that will be using
these files was originally designed for SGML, for a project that has
several gigabytes of SGML data from before this newfangled XML stuff was
ever invented. That application uses the SP tools (among other things),
so the files have to work with SP.
> xmllint also has schema support, for both W3C schemas and Relax NG,
> something which nsgmls will never have (I assume). At present,
> the schema support is not really useable, but Daniel Veillard
> is working on it on a daily basis (watching his ChangeLog
> is amusing) and I would imagine its only weeks before a release
> which would work with the TEI Relax NG experimental schemas
> > OK, I realize I need a system identifier next to that public
> > identifier. But the whole point of the public identifier is so I can
> > specify the location of the files that make up the DTD in *one* place,
> > in my catalog file, rather than in the headers of every single one of
> > the user's XML files. So I want to tell the parser to look at the FPI
> > and *ignore* the system identifier.
> thats exactly what XML parsers are *supposed* to do, surely?
I would have thought so, but it sure looks like this one isn't doing it.
> does Xerces implement XML catalogs?
> if so, they allw you to map a system entity like "foo.dtd" to
> an absolute local path, so you can use it anywhere. xmllint
> supports that, as does xsltproc.
> Luckily Gregory has explained the Java way of doing all this,
> so I don't have to expose my total ignorance of such things.
I have determined that within cocoon one specifies the catalog in a
"catalog manager" definition file. Once that's done, cocoon passes the
information to xalan and everything goes as planned. So all I need to
do for command-line validation is to edit the supplied sample Validation
class to turn on catalog support; this ought to be straightforward,
since I have the cocoon sources and can simply look at what they did.
I'm still surprised that catalogs aren't automatically supported, and
that system identifiers seem to take precedence over public ones unless
you specifically say they shouldn't. Just one more thing we've lost
from the golden Saturnian age of SGML, I guess.