Coming to this a little late....
At 08:16 AM 10/30/2008, Hugh wrote:
>DTDs are very powerful things, but I've come to the conclusion that
>features like entities and default attribute values are Bad Things
>whenever you have data that you might use other XML tools on. For
>example, if I want to use an XSLT to add id attributes to certain
>elements in my document, the mere fact of running said document
>through the stylesheet will resolve those entity references and strip
>my DTD declaration from the document. I have to have the XSLT add it
>back, and God help me if I have an internal subset.
>All of this can be handled one way or the other, but the point is that
>with DTDs I have a rule set that changes the shape of my document when
>I parse that document. So the document as it appears to my processing
>tools (like XSLT), which are looking at it post-parsing may be quite
>different from the document as it appears to a person looking at it.
>Errors and misconceptions abound. An identity transform is not an
>So I've come to the conclusion that it's a fundamentally bad idea to
>use schemas that contain instance data (like entities). And if you
>aren't going to use the parts of DTD that make it so powerful, then
>why not use something with namespace support, etc.?
Keep in mind that the XML DTD is essentially a subset of the SGML
DTD, which provides a great deal of other sorts of support to parsing
and markup, including tag omissibility and so forth, which were
disallowed in XML in order to keep parsers small and simpler to
implement than SGML parsers (in an age when machines were also
considerably larger and faster than the machines for which SGML was designed).
In this context, the close dependency of instance and its
declarations makes more sense than it has come to in XML, in which a
much looser binding has come to be the norm. The exception to this
rule would be heavy data-crunching applications, where compiling
datatypes (and thus a tighter binding between instance and schema) is
so rewarding for performance reasons. For the most part, in document
processing and especially in projects that must focus on flexible and
responsive document modeling (can anyone say "TEI"?), loose binding
pays off in a considerable reduction of complexity, inasmuch as
modeling and markup can usefully be isolated from one another, and
the price of more machine cycles and memory used up shuttling bits
around isn't too high to pay.
The things Hugh complains about here, general entities and implicit
attribute values, are precisely the detritus left from the old days
because they were considered, in 1998, still too valuable to lose.
Now, with Unicode, better user interfaces, XML Include and XSLT, they
are perhaps much less compelling than they were then.
I wrote about some of these issues in my way-old paper, Beyond the
'Descriptive vs Procedural' Distinction ... findable on that Internet thing.
DTDs still work well enough to be ubiquitous in certain environments
where they always worked well. There, the best reason to use them is
probably "if it ain't broke", etc.; underlying this is the (not
inconsiderable) migration costs, given that DTDs and document sets
are still evolving together, and projects find it hard to freeze even
one or the other (and certainly not both) long enough to redesign and retool.
Yet it could also be said, as I once read in an advertisement for
machine tools, "If you know you need new tools, and you haven't
already bought them -- you're already paying for them."
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML