Print

Print


On Aug 11, 2004, at 11:02 PM, Automatic digest processor wrote:

> I'd be interested in any replies to this. We're about to launch a
> TEI-lite journal and have similar issues.
>
>> We plan to make a series of articles searchable on the web. Our plan
>> is to
>> mark these up using Text Encoding Initiative.
>>
>> Which software is best for the marking up process? The files will be
>> in
>> plain text and the operating system will be Windows 2000.

I use swish-e for this purpose:

   http://swish-e.org/

Swish-e is an open source indexer/search engine. It excels at indexing
(X)HTML files, but indexes plain text and XML files almost as easily.
It comes with C, PHP, and Perl API's, and it runs under (over?) Unix as
well as Window's operating systems.

I am/will be using swish-e as the underlying indexer for searches
against TEI documents. Specifically, I have been marking sets of
literature up in TEI. I then convert the sets into a number of formats
such as plain text, XHTML, PDF, various Palm flavors, etc. I then use
swish-e to index the XHTML because swish-e does makes it easy to pull
out the meta tags of HTML head elements and make them field searchable
as well as the body of the text being free-text searchable. I could
have almost as easily indexed the raw TEI files, then then I have to
deal with transforming the XML before it gets to the browser. ("I know.
There are many ways to do that."). See:

   http://infomotions.com/alex2/

I have also been fiddling with Plucene, a Perl port of Lucene, a
Java-based indexer/search engine library:

   http://search.cpan.org/dist/Plucene/

Unlike swish-e, Lucene/Plucene are libraries. Swish-e is a
indexer/search engine binary as well as a library.

--
Eric Lease Morgan
http://infomotions.com/