> > It's almost possible to write a fairly simple Perl script to do this
> > just operating directly on the XML files, but it won't properly handle
> > XML containing tags within comments, CDATA sections, and so on.
> With XML::SAX you have both Perl and proper access to the XML content. By
> "tags within comments", if you mean tags commented out with <!-- -->, I
> believe it can't be handle by any parser, as the content of a comment is
> parsed as raw text.
I too wondered what was meant by "tags within comments, CDATA sections etc."
Is this a requirement to see and process markup that has been hidden from
the parser within a comment, or disguised as text by wrapping it in a CDATA
section? If so, Sebastian's xsl won't do this, and as Sylvain says, no
conformant parser will deliver such things to you as if they truly were
markup. A SAX parser which provides a lexical handler (not all do) will
deliver comments or CDATA sections to you in their entirety via the
associated callback, but then you would need to handle them lexically, i.e.
by standard string-matching methods, to pull out things you've told the
parser are none of its business. Or you can parse into a DOM then visit the
CDATA and comment nodes, but not all DOM parsers populate these nodes fully
and reliably, and even where they do, you would still have to hunt out your
"hidden" markup for yourself.