ok, lets elaborate a bit more.

a) use this template in my XSL script

<xsl:template match="text()">
    <xsl:when test="contains(.,'&lt;')">
      <xsl:result-document omit-xml-declaration="yes"
      <xsl:value-of select="." disable-output-escaping="yes"/>
      <ptr target="BLOGXTRACT-{generate-id()}.html" rend="transclude"/>
      <xsl:value-of select="."/>

which writes out any bit of text which contains a < to a
file, and adds an pointer to that file.

b) loop over every BLOCXTRACT*.html file and do
a standard HTML cleanup on it using tidy or equivalent;
that should get you well-formed XML. I use a PHP script
on the command-line looking like this:

$dom = new domdocument;
$dom->formatOutput = true;
$dom->encoding = "utf-8";
echo $dom->saveXML();

c) process the file which resulted from stage a) and follow
the <ptr type="transclude"> elements with

  <xsl:template match="ptr[@type='translude']">
    <xsl:copy-of select="document(@target)//body/*"/>

d) now do your proper transformation into target XML.
Sebastian Rahtz
Information, Oxford University Computing Services
Sólo le pido a Dios
que el futuro no me sea indiferente