On Wed, 16 Jul 2003, Sebastian Rahtz wrote:
> > (P.S. I'm still looking for the perfect "XML grep" program.
> don't you find normal grep actually works 99% of the time?
For most things. But here's a typical case where it doesn't:
<!-- lots of stuff -->
Assuming the above is in "foo.xml",
grep "Joe.*Smith" foo.xml
returns only the first <name> line.
But using the Pathan "xgrep" program:
xgrep '//name[normalize-space(text())="Joe Smith"]' foo.xml
returns both <name> nodes. Or using the search function in XMLStarlet:
xml sel -t -m "//name[normalize-space(text())='Joe Smith']" -c . foo.xml
does the same thing.
I gather that XPath 2.0 is going to have a match() function that will
allow regular-expression searches, so an "xgrep" program incorporating
that functionality will have clear advantages over plain old grep.
(I should maybe add that my obsession with command-line XML tools derives from
years as an editor of print publications, where I routinely converted authors'
word-processing files to ASCII so that I could use Perl and Unix-ish tools
like grep/sort/cut/uniq to do various kinds of searches and consistency
checks. Coming up with a similar toolkit for XML-based publications,
particularly ones based on complicated tagsets like the TEI ones, is taking
some effort, but it's getting there.)
David Sewell, Managing Editor
Electronic Imprint, The University of Virginia Press
PO Box 400318, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [log in to unmask] Tel: +1 434 924 9973