Eric Lease Morgan wrote:
> On Aug 11, 2004, at 11:02 PM, Automatic digest processor wrote:
>> I'd be interested in any replies to this. We're about to launch a
>> TEI-lite journal and have similar issues.
>>> We plan to make a series of articles searchable on the web. Our plan
>>> is to mark these up using Text Encoding Initiative.
>>> Which software is best for the marking up process? The files will be
>>> in plain text and the operating system will be Windows 2000.
> [snip]I am/will be using swish-e as the underlying indexer for searches
> against TEI documents. Specifically, I have been marking sets of
> literature up in TEI. I then convert the sets into a number of formats
> such as plain text, XHTML, PDF, various Palm flavors, etc. I then use
> swish-e to index the XHTML because swish-e does makes it easy to pull
> out the meta tags of HTML head elements and make them field searchable
> as well as the body of the text being free-text searchable.
We contributed an article to _XML in Libraries_ By Roy Tennant, Editor
about our approach to this (which currently lives in a couple of
environments, one of which is Windows 2000 running IIS 5).
Essentially it takes advantage of the fact that a great many indexing
engines from swish-e to google can index undifferentiated HTML full text
and may take advantage of some explicit HTML meta tags (the XSL
transformations can be used to populate Dublin Core fields for
example). It's what people using the WWW have come to expect, but it's
certainly significantly less than the potential offered by the raw TEI
We then took the next step and derived an index based on the <index>,
<rs> and <name> tags, served up from our relational database of choice.
This allowed us to normalize (REGularize) the search values as well as
add a browse capability which may not be relevant to your requirements.
What it doesn't do is allow the arbitrary searching of nested elements
on the fly. The tools that do this allow great precision for a number
of research purposes, and I have seen some with a web form interface.
None that I have seen would I present to a casual user, of the sort
whose expectations have been defined by their experience with Google.