Sorry for coming late to this thread ...
At 03:28 AM 3/29/2010, Bertrand Gaiffe wrote:
>>>As others have pointed out, <subst> is a special case, because it has
>>>only element content. Although an XML processor is required to pass
>>>through the white space around the <add> and <del> elements it contains,
>>>they have no significance in the TEI representation, since the meaning
>>>of the <subst> element is that the content of its <del> child is to be
>>>replaced by the content of its <add> child. It would be a different
>>>matter if there were any white space *inside* either the <add> or the
>>><del>, of course.
>I just do not understand the point about the whole discussion.
>As stated in this post, in my experience, the problem is usually
>with elements that are not mixed content ! And this leads to such
>incantations as :
><xsl:strip-space elements="tei:index tei:app tei:w tei:lem tei:term
This is the canonical approach in XSLT 1.0, because it follows the
rule "any whitespace in the document may be significant; it is up to
a processor to decide otherwise". Accordingly, XSLT 1.0 tools, which
are not schema-aware and which therefore have no notion of what is
declared as element-content or mixed-content, take a conservative
approach. The strip-space directive is there exactly so that the
stylesheet author may ameliorate this when it is known statically
(albeit not to the XSLT processor) that whitespace-only text nodes
should be stripped.
It is also important to keep in mind that only whitespace-only text
nodes are to be stripped directly inside the elements named. No other
whitespace munging is performed; if that happens, it happens downstream.
It also happens that XSLT 2.0 can be schema-aware. Accordingly, XSLT
2.0 processors are allowed to do this kind of cleanup on your behalf.
Saxon 9.2 will do this when parsing with a DTD or XSD, unless you set
your switches to have it otherwise.
Parsing with RNG is another matter, however.
Were I implementing a TEI-based system and I wanted robust handling
of this sort of thing (and I didn't want to have to specify
strip-space every time), I might have a whitespace-stripping
stylesheet in my pipeline. Even when tools always do the right thing,
users do not.
The problem with whitespace and preserving whitespace is that we
don't always mean the same thing by "whitespace", and we don't always
mean the same thing by "preserve". The distinctions between our
various actual requirements need to be recognized and respected
before we can expect our machines to do the right thing.
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML