Print

Print


On Mon, 2005-09-26 at 21:20, Wendell Piez wrote:
> At 07:52 AM 9/26/2005, you wrote:
> >So an application like XSLT has a switch to allow the preservation or
> >removal of white-space. Unfortunately the distinction is binary: you
> >can either keep it or lose it, and no distinction is made between
> >white-space found in element content and that found in mixed or PCDATA
> >content because *at the time of parsing it*, the parser may not know
> >what further type of content awaits it within the current element.
> >
> >The result is that text nodes containing only white-space tokens are
> >removed entirely when the strip-space switch is ON. My argument is that
> >if at least one subelement has already been encountered in the current
> >element, then white-space-only nodes should no longer be suppressed in
> >this element, but collapsed to a single space token. This would still
> >permit the suppression of leading white space nodes, which is almost
> >always what you want, but it would defeat the suppression of trailing
> >white-space nodes (because in mixed content a preceding element would
> >have been encountered).
> 
> I'm a bit mystified because what you say above suggests you have set
> 
> <xsl:strip-space elements="*"/>
> 
> which is not necessary.

*I* don't, but I've seen a lot of users trying it that way.

> If you have a schema (or even if you don't, but know what the schema 
> would tell you), it's not hard to say
> 
> <xsl:strip-space elements="TEI.2 body div"/>
> 
> where TEI.2, body and div are those elements in your schema defined 
> as having element-only content. This way you can safely dispose of 
> whitespace-only nodes that are there only for cosmetic reasons in the 
> code, while safely leaving in place any whitespace that might matter.
> 
> What's so hard about that?

Nothing, except that when you are dealing with a very wide range of DTDs
and Schemas, it would be more useful if the work of determining where
element content occurs were done by the parser (which already knows,
dammit! because it's just read the DTD!) than to make the users work it
out for themselfs.

It's really down to usability: XML changed the document model from SGML
(in most cases, in the right direction) but this was one area we fouled 
up in. No biggie, but the concentration on "data" use of XML has meant 
that the needs of "document" usage have largely been set aside.

> Respectfully, while I understand why you want XML tools to do a 
> better job at what-was-once-intended-for-SGML tools, 

My users want XML to at least do *as good a job* as SGML did. In this
respect XML is falling down.

> I don't think 
> any other suggestion gets anywhere close to the right balance. In 
> particular, I do not think it would be a net gain if we had another 
> area where a document processed with a schema gives different results 
> from the same document processed without a schema. If your schema has 
> the effect of modifying your data when it is processed, IMO it should 
> be a transform. ;->

Then the transformation language should provide the facility to do it
right. XSLT does not at the moment provide this, AFAIK, because it
removes those white-space nodes in mixed content which should only be
compressed to a single space instead. It is precisely the transform
which is modifying the data, not the schema or the parse.

///Peter