At 06:10 AM 9/28/2005, you wrote:
> > At 03:56 AM 9/27/2005, you wrote:
> > > > I'm a bit mystified because what you say above suggests you have set
> > > >
> > > > <xsl:strip-space elements="*"/>
> > > >
> > > > which is not necessary.
> > >
> > >*I* don't, but I've seen a lot of users trying it that way.
> > Ah, so this is an argument for catering to ignorance? (It's okay if
> > it is; sometimes this is necessary.)
>Not really so much catering to ignorance, as providing an unexpected and
>unpleasant surprise behaviour which conflicts with expectation, and thus
>causes what the user would regard as unnecessary extra work.
Ah, this is the nub of it. I guess I just don't have the same
experience of what people "naturally expect".
> One of my charges used the phrase "why couldn't they have made it
> do it the right
While this would be understandable, so would the counter be: "why is
it messing with my whitespace without my say-so? Why does it make a
difference whether the schema's there or not?" issued in a similarly
plaintive tone. So we simply disagree as to whether they hit the
right line on this one.
Also, in my experience such *attitudinal* adjustments as one would
make when learning something one didn't design oneself can be helped
by a positive spirit on the part of the instructor. "It might be nice
if it would fix up your whitespace, but it doesn't, which can be kind
of a pain, but that turns out to be surprisingly difficult to do in a
way that satisfies everyone" etc.
(Or if it *isn't* difficult, let's see the stylesheet or Python
script or Perl! :-)
> > I think this is a tradeoff. Sometimes you want the tool just to "do
> > the right thing". Sometimes you want more of an informed choice by
> > the user. The latter, especially when "the right thing" has grey
> > boundary lines (or is grey altogether). I'm not convinced I or we
> > would be happier with a new schema dependency in XSLT processing.
> > (I've stopped defaulting attributes entirely as I've shifted to more
> > powerful, flexible and expressive schema regimens, such as
>Definitely, but we have to handle a lot of legacy, and a lot of that
>simply isn't going to change in the foreseeable future.
Absolutely. I am very much in favor of DTDs. I think the XML core
needs to accommodate many ways to validate, and DTDs are still
necessary and useful for many things. (I'm also a proponent of
converting your schemas and using them in different forms, when
possible, though it isn't always.) I just don't want them to be
making complications even when they're not being used (by setting up
a default processing rule around them).
> > Again, depends on where you sit. Most of "my users" (I know lots of
> > users at all levels from all sectors) don't have any basis for
> > comparison to SGML, but I can sure see they're benefiting from, e.g.,
> > the huge range of XML tools that SGML never had. There are many
> > reasons for the existence of this toolkit; but I think XML's cleaner
> > layering between instance and schema is one of them.
>Yes, that and the vast majority of other goodies. This particular
>problem is small in the global scheme of things, but when dealing with
>very large quantities of extremely dense markup in mixed content, it's a
>pain in the ass.
More reason for that Perl in your pipeline?
> > Could you specify your requirement for munging again? Could it be
> > done as a function munging text nodes?
>With strip-space turned on globally (the case in point), the
>white-space-only text nodes between elements don't make it through to
>the XSLT, so you cannot address them :-)
So you're basically asking for a "munge whitespace intelligently"
option that works at the document scope, but modifies whitespace
(intelligently) rather than stripping it arbitrarily.
So, a spec:
text nodes in mixed content: compress and trim whitespace, but do not
delete whitespace-only nodes
text nodes not in mixed content: preserve? (So as to preserve line indenting?)
definition of mixed content -- ref to a DTD?
(Ironically, it's the last req that might make XSLT unsuitable for
this task: it'd be easier if it were an RNG schema. :-)
>Like many such problems, it's not a difficulty if you are in a position
>to specify the list of element types for strip-space. My objection is to
>the *default* behaviour for "*".
There are people who've never heard of mixed content (who as you know
can be surprisingly obtuse about it), who regard strip-space
elements="*" as a feature, and use it on their banking data, numeric
data sets or whatever. They'd be unhappy without it.
> Defaults should cater for the most
>commonly-occurring circumstances, and not break the document model by
>predication on borderline cases and special parameters.
I bet the sum total of XML in use, measured in bytes, is way over on
the "data-centric" side at the moment. (One can hope this will
rebalance somewhat over time.)
> And strip-space
>can only take Names in its elements attribute: it would be nice if you
>could use XPath :-)
Yeah ... actually they're more like patterns ... wonder how XSLT 2.0 does this.
>Web browsers have an implicit version of strip-space in their default
>handling of HTML. Imagine if the space between two adjacent elements in
>mixed content suddenly disappeared.
Oh I see it all the time (aforementioned horrible bug in MSXML, hence
in IE)! Everyone hates it. It's abominable. It's probably the single
worst presentation-level bug in common deployments of XML.
But you see that one is an *MS* bug, and not in XSLT, which (even MS
agrees) specifies this right. You only get strip-space if you
explicitly turn on the switch.
> > It sounds like an Extreme paper, Peter.
>"Why I demand white-space" or "Strip-space considered harmful" :-)
> > And all of this is thankfully beside the original question, the right
> > answer to which depends on much more than whether XSLT happens to be
> > "good enough" in this respect.
>Certainly good enough. Just unfortunate that this circumstance was not
>more obvious at the time: part of the problem seems to be the CS-derived
>insistence on treating the document as a tree, even when it's not.
I have an entire lecture sketched out in my head about the influence
of the James Clark model of SGML/XML. We owe Clark so much that it is
sometimes hard to notice the down side of this particular XML
orthodoxy. But it isn't far to look for -- indeed if we were still
used to parsing strings we'd probably be further ahead on the overlap
problem by now -- but it is indubitably powerful, and its strength is
in its simplicity. And it's not a bad place to rest, and get good
work done, while we contemplate the next thing. So, part way up the mountain.
So I am probably as eager as you to get beyond the limitations of the
tree-view, while perhaps not feeling so sore about this particular
over-application of it as you (granting for a second all your arguments :-).
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML