On Monday, November 26, 2001 5:23 PM
Christiane Meckseper wrote:
> I'm interested in some front end user issues, especially the searching
> of documents marked up with the TEI.
Searching and retrieval, though they are obviously things users do, are
not really "front end user issues", any more than the indexation and
data organisation methods that underlie them are. The actual interface
users see, and the ways they react with it, might well be such issues,
but I don't think that's actually what you're asking about.
> Once I've marked up my texts in SGML/XML, converted them to HTML,
> have designed a user interface, etc. what would be a good search
> engine to attach to my application?
Is that meant to be a logical, or merely a chronological sequence? OK,
the traditional database application design approach where you drank
endless mugs of coffee with your clients, drawing your "forms" and
"reports" on paper, then worked backwards from there, first into data
structures, finally into retrieval code, might not be staightforwardly
applicable to a text repository (I assume that's what you have in mind,
though you don't really say) but it's not a bad sequence to follow if
interactive retrieval and/or queries of a fairily high granularity or
complexity (rather than what one might call digital conservation or
simply paging through the texts) is your primary aim.
What you call "the TEI" isn't a ready-made system. It's simply(!) a vast
set of possibilities, with documentation that is suggestive rather than
prescriptive. Which of those possibilities you opt for (or add to via
your own extensions) needs to be determined from an early stage by a
clear notion of what you want to achieve (i.e. by some of the things you
seem to be classing as "front end user issues") which consequently need
to be explored, if not finalised, at a rather earlier stage than you
seem to be envisaging.
Also, treating "conversion to HTML" as a monolithic step between
XML/SGML markup and "designing a user interface" is rather problematic.
Depending on your target audience and your timescale for project
delivery, you may not ever need to convert to HTML, and if you do, you
might be better planning to let that conversion occur ad hoc at
retrieval time, either server or client side, than making it a major
step to be applied to all your data in bulk. Thanks to XSLT (and to
Sebastian, who has shown us how to bring TEI and XSLT together), mass
conversion to HTML need no longer be necessary in a well designed
> I have been searching the web for software, however, SGML/XML text
> search engines are few and far between
Well, that's probably because you're looking for separate "engines" that
would just plug in as the front end to a TEI-based repository. No such
beasts, or not yet, at least. The predominant practice at Sheffield so
far seems to have been to develop packages reliant on (maybe
justifiably) pricey authoring-plus-delivery systems using non-TEI SGML
DTD's. These packages encompass editing, versioning control/document
management and CDROM or server delivery. You (or more precisely the UK
taxpayer) pays your money and you gets your searching as part of the
If you want to go open source for your indexation and retrieval system
as well as open standards for your markup, you are into, if not exactly
uncharted territory, then terrain where there are so far only sketchmaps
jotted down by intrepid explorers who are still en route themselves and
not yet ready to take time out to write up their memoirs. AFAIK the best
progress so far has been made by the people in the eXist project
http://exist.sourceforge.net/ but that is certainly not a "front end"
of the sort you seem to be envisaging, and if you wanted to use it
effectively you would need to think hard, sooner rather than later,
about your markup policies and practices.
The best overview I know of the problems and possibilities in this area
is Liam Quin's "Open Source XML Database Toolkit" (Wiley), and if you
don't know that text, I recommend that you study it before going any
further, so that you'll get a clearer sense of the terrain ahead of you.
Be warned though, it's not a "cookbook"; or if it is, it's a lot more
like Elizabeth David than Delia Smith (apologies to people who aren't
into UK cookery writers, but it would take too long to explain...)
Michael Beddow http://www.mbeddow.net/
XML and the Humanities: http://xml.lexilog.org.uk/
The Anglo-Norman Dictionary http://anglo-norman.net/