David Sewell wrote:
> Sorry, my wording was unclear. What I meant was that it's easy to write
> an ordinary Perl script to extract and manipulate the content of
> SGML/XML tags, defined as "text inside < > delimiters".
I don't really think it is that easy, actually. Not when you have nested
elements and mixed content. And when it comes to things like default
attributes and their values, a merely lexical approach becomes even tougher,
and maybe impossible.
The work done on Perl modules for XML parsing over the past 18 months or so
by Matt Sergeant, Sebastian Glahn, Kip Hampton and others means that pukka
XML parsing within Perl is now actually a great deal simpler than lexical
operations on a raw text stream.
1) David Cross, Data Munging with Perl (Manning, 2001)
2) Erik T. Ray amd Jason McIntosh, Perl and XML (O'Reilly 2002)
1) covers many things besides XML, and predates some of the more interesting
recent developments: nevertheless it still provides probably the best
introduction to XML parsing within Perl (Chapter 10), and it's a goldmine of
information and examples on many other aspects of data handling.
2) is a good survey of the possibilities and techniques available at the
time it went to press, with detailed and useful examples. Nevertheless, some
of those examples now no longer work as-is because of the pace at which the
modules they rely on (and the underlying libraries) have developed.
The best way to stay up-to-date is to follow Kip Hampton's articles on the
O'Reilly site, though these too are sometimes overtaken by developments
within a few weeks of first appearing. I would particularly recommend
High-Performance XML Parsing With SAX
Perl XML Quickstart
XML::LibXML - An XML::Parser Alternative
Transforming XML with SAX filters
(this gives a clear idea of what SAX filters are and what they can do, but
practical details have been superseded by the techniques described in the
following two pieces)
I could go on (I haven't even mentioned the perl-Xerces-Pathan tie-up) but
my main point is that XML processing really requires XML-savvy tools and
that Perl programmers now have a large boxful, with plenty of advice on how
to make effective use of them. So that Perl hacking and fully XML-aware
processing are no longer worlds apart, bringing such processing comfortably
into the ambit of any scholars who are not averse to doing a little