being the founder of eXist-db and a frequent TEI user myself, I
usually get a lot of inspiration by looking at other use cases and
learning about the success or problems they may have, so please do not
hesitate to contact me directly or via the eXist list if there's
something I or the community can help with.
> with the list of xpath expressions to optimise for
I see you mainly create range indexes:
<create qname="lemma" type="xs:string"/>
<create path="@xml:id" type="xs:string"/>
The first form (qname="lemma") will result in much better performance
than the seconf form (path="@xml:id"). Indexes of the qname type can
be used by the query optimizer to rewrite queries, which normally has
a huge effect, while the path index does not allow the full range of
possible optimizations. The "profiling" panel of the admin web app can
be used to see which indexes could be optimized, where indexes were
missing and what it cost.
The formulation of the query also plays a major role as described in
my tuning guide: http://exist-db.org/tuning.html. A range index is
good for queries on the start of a string, but bad for containment
queries. To say more, I would need to see how exactly the data set and
the query looks like. Often it is a small detail which breaks
performance. As usual, it is best if I can just run the slow query
myself. If you could tell me where to find the data set, I'll be happy
to give it a try.