This is a question about the "system" part of a public identifier in an
A developer on one of the Stoa's projects has written a bunch of XML
files which we would like to be able to parse with both Xalan and
Nsgmls, for use with a couple of different application platforms. One
possibility is to make a DTD with the pizza chef, edit it to include the
project's own modifications (a couple of new attributes for <bibl>),
leave this DTD in the same directory as the XML files, and reference it
with a SYSTEM entity, like so:
<!DOCTYPE TEI.2 SYSTEM "mytei.dtd">
This works (and so I do already know that the XML files in question
*are* valid). But the XML files are in a few different directories. My
user had naively just made copies of the DTD file to all those different
directories, which seems to be asking for trouble. OK, so I could use
relative paths in the DOCTYPE (e.g. "../mytei.dtd"); this assumes that
the structure of that directory tree is not going to change, though, and
this project is not yet settled down enough yet for that. So I suppose
I could put the DTD in a given directory to which we can use an absolute
<!DOCTYPE TEI.2 SYSTEM "file:///path/to/file/mytei.dtd">
or, roughly equivalently I guess,
<!DOCTYPE TEI.2 SYSTEM "http://localhost/some/place/mytei.dtd">
but this assumes that the standard place for DTDs is the same on the
several systems where this has to work, and it's not. (And, as I do not
control all those systems myself, I can't fix that.)
But this, it seems to me, is what we have catalogs for. In straight
SGML, I'd just do this:
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" [
<!ENTITY % Project PUBLIC "-//Stoa//DTD Project TEI mainline//EN">
and everyone would be happy. Nsgmls is OK with this even if it's being
told to parse XML. But since this *isn't* proper XML syntax for public
identifiers, Xalan correctly complains that the file is not well formed.
OK, I realize I need a system identifier next to that public
identifier. But the whole point of the public identifier is so I can
specify the location of the files that make up the DTD in *one* place,
in my catalog file, rather than in the headers of every single one of
the user's XML files. So I want to tell the parser to look at the FPI
and *ignore* the system identifier.
For nsgmls, you can do that with
near the head of the catalog file, accompanied by the following
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN"
<!ENTITY % Project PUBLIC "-//Stoa//DTD Project TEI mainline//EN"
Catalog says "I will override any system IDs you find in DOCTYPEs 'n'
stuff"; entity definition in catalog says where the file is; nsgmls
doesn't even bother opening /dev/null; file is valid and happy.
But Xalan doesn't like this. If I use the Validate application that
comes with it in the samples directory, I get the following:
$ java Validate foo.xml
NOT WELL-FORMED foo.xml. The entity "mdash" was referenced, but not
1 file is not well-formed.
The declaration of 'mdash' is in one of the regular ISO entity files,
referenced in that "Project" DTD file.
Just for fun, I took out *all* the character entities in the file
(replacing — with '-', á with 'a', etc.) and ran it through
the validator again. Nsgmls of course still reported no errors. Xalan
was even more unhappy:
NOT VALID foo.xml
foo.xml Error: Element type "TEI.2" must be declared.
foo.xml Error: Element type "teiHeader" must be declared.
foo.xml Error: Element type "fileDesc" must be declared.
foo.xml Error: Element type "titleStmt" must be declared.
foo.xml Error: Element type "title" must be declared.
and so on for *every* element used in the document, along with
"Attribute 'id' must be declared for element type 'div1'." and so on for
every attribute used in the document.
In other words (a) if xalan finds problems with entities, it doesn't go
any further; and (b) it's clearly not finding my DTD.
Does this mean I'm stuck actually putting a URI for the DTD file into
every one of the user's XML files, and having to maintain it when it
changes? Or am I missing something stupid (e.g., telling xalan where
to find the catalog)?