> Have members here had much experience with eXist as a native XML
> database platform for TEI (P5) databases
Not very much, but we started using it for corpus work and it performs
pretty well. For large documents and large collections it is probably a
good idea to give the JVM a little more memory though.
> I have heard that eXist has had issues with performance with large
> amounts of data and numbers of simultaneous users, but that recent
> versions have addressed those concerns. Is that true?
I have not tried with many simultaneous users, but large amounts of data
-- whatever that might be -- should be ok. Just try importing large sets
of data, and build a few complex queries then you will probably get an
idea if eXist's performance is sufficient for you. Our data uses a lot
of references that are resolved in the queries and performance there is
> Do you
> think eXist is 'ready' for prime time use?
As Sebastian already pointed out: it really depends on what you
understand 'prime time' to be. There may be limitations you may find in
the documentation, but I assume, based on your description, that you are
not too likely to hit any. Apart from that it is worth trying. For our
use, the new indexing implementation with its capability of correctly
tokenising mixed content nodes is crucial. That way markup like '<u>we
are n<seg>ot at all</u>' can be used with the full text index and still
the <u> may be found with //u[.&='not']. I hope it is not too long until
eXists supports parts of XPath Full-Text 1.0 , then many more
interesting queries should be possible.
> Basically, I'd like to avoid getting too committed to eXist, just to
> find out that I hit a performance wall and need to go with a
> commercial implementation such as Mark Logic.
I think you can test the performance way before you implement your
> (My database will start
> with about fifty 800+page digitized volumes and eventually hold up to
> 400+ volumes, and it should support 15-50 simultaneous users.)
I think that should be ok. We have about 25Mb of transcribed speech
right now. But it really depends on the kinds of Queries you are
planning and how clever you set up the indexer's configuration.
| Stefan Majewski | Department of English, University of Vienna |
| VOICE Corpus | Spitalgasse 2-4, Universitätscampus AAKH, Hof 8 |
| | A-1090 Vienna |
| Research Ass.(IT)| Phone: +43 1 4277 424 46 |