On Tuesday, April 15, 2003 8:08 AM
Richard Light wrote
>
> An alternative approach, which we have used successfully for the Perdita
> project (http://human.ntu.ac.uk/perdita/PERDITA.HTM), is to use XSLT to
> generate a complete web site giving indexed access to the material.
[...]
>
> Alternatively, you can look into the use of Open Source XML databases
> such as Xindice or eXist as the foundation for XML-aware search
> functionality - but you would then have to build the rest of the
> application yourselves on top of these engines.
Actually these approaches can be complementary. One nice thing about
"native" XML databases is that they return query results as well-formed XML
documents, which are thus available for any desired XSLT transformations or
DOM-walking procedures before they are finally delivered to the user. For
instance, in a concordancing application, collocation field indices (the
collocation field being made up of n words/tokens to the left and/or right
of the matched term) can be quickly generated on the fly from the result
set, and cached on the server, where the user can access them and re-apply
them to the results document without further reference to the core data
store.
And building the application oneself is not as big a task as it sounds, now
that Cocoon 2 offers a sophisticated and powerful (and free) application
framework, which is by no means as rebarbative as its official documentation
makes out. (A much better intro to doing useful things with Cocoon is on pp.
535-558 of Jeni Tennison's Beginning XSLT).
In case there are any traumatised "early adopters" out there who (like me)
got badly burnt by Cocoon 1 a few years back: Cocoon 2 is a very different
beast, simple to install, very stable and also reasonably fast. It embodies
an expressive and powerful abstract model, which divides things into
"generators" (which deliver the input data), "transformers" (which do things
to the data) and "serialisers" (which package the transformed data into the
desired output format). Essentially, building an application in Cocoon is
simply a matter of assembling (or writing or adapting) the required
generator, transformer(s) and serialisers, and plugging them together (using
an xml configuration document or "sitemap" to specify what plugs into what,
the so-called "pipeline"). Everything else is taken care of.
In practice, this for most people means writing XSLT sheets to do the
desired transformation, then connecting them to existing "black box"
generators and serialisers. This presupposes an understanding of XSLT, but
there is (or need be) nothing special about the way XSLT is used within
Cocoon. No knowledge of the broader workings of Cocoon or of the underlying
Java engine is required to create applications of arbitrary sophistication.
In the beginning (before this model was fully abstracted from the workings
of Cocoon 1) there was just one sort of "generator", viz a module that could
read and parse XML documents from a filesystem, two sorts of "transformer"
(an XSLT processor and an XSL-FO engine) and a handful of "serialisers" (to
produce XML, various flavours of HTML or PDF output). Now there is a
plethora of generators, transformers and serialisers available.
This is where eXist fits in (or Xindice, though I have nothing useful to say
about that system). eXist can be a Cocoon "generator", feeding its results
to an XSLT transformer and on to the user via an html serialiser. But prior
to that, Cocoon can also manage user input to eXist, "generating" suitable
query forms, "transforming" the user's input to those forms into eXist
queries and "serialising" the queries out to eXist, which then "generates"
the results back into Cocoon for XSLT transformation and HTML serialisation
back to the user. So anyone who has a reasonable understanding of Web forms
handling and XSLT can use Cocoon to build a user-friendly application around
eXist, and hence a search engine capable of structure-sensitive querying of
any TEI -XML document, and deploy it on the Web.
Michael Beddow
|