I'll throw one more in here. We've been using Solr a lot for the discovery
layer, splitting XML in to different types of Solr documents as needed, then
using client XML/XSLT libraries to provide more granular search results on a
per-page basis as needed. You can see an example of the technique at
http://raven.scholarslab.org/; If you're really interested, you can check
out some code we have at the githubs: http://github.com/mwmitchell/raven
On 7/1/10 4:11 AM, "Richard Light" <[log in to unmask]> wrote:
> In message <[log in to unmask]>,
> Sebastian Rahtz <[log in to unmask]> writes
>> For a project here, we recently switched from a setup based on eXist
>> and XQuery to one using an SQL database; the speed of operation and the
>> speed of development rocketed overnight :-} This worked because our TEI
>> file consisted of 250,000 "records" (TEI <person>), which we stored
>> untouched in one column of the table, and added as many columns as we
>> needed to index the data. Then we used XSLT to format the <person>
>> records which came back from a query. It's not a new technique.
>> Of course, this is not a traditional use of TEI, but it demonstrates a)
>> that there are applications which are TEI but look more like a
>> database, and b) you can combine XML tools with SQL databases.
> Just to mention another hybrid approach: we also use a relational
> database engine and store XML fragments as BLOBs. We then add an
> indexing plugin to the database. This allows us to specify multiple
> indexes which use XPath expressions to index XML content within the
> BLOBs. However, as far as the database engine is concerned, these are
> "normal" indexes.
> We use parent and child processing instructions to indicate the position
> of each XML fragment within the original document. This allows the
> re-creation of this document as part of a report generation "pipe".
> This approach gives us the benefit of holding our TEI as a shared
> updateable resource (with the usual relational record-locking and
> transaction support, and real-time indexing of content).
> On the retrieval front, this approach doesn't help much with external
> querying, since SQL doesn't deal in indexes, but we have built custom
> search mechanisms which use the XPath indexes directly, and are happy
> enough with that. Nor does it give you the ability to put ad hoc XPath
> queries to the entire document as a native XML database would.
> One advantage of this approach is that it will support any type of TEI
> document, not just "record-like" ones. The one requirement is that you
> have to decide on a "chunking" policy, and assign a unique identifier to
> each chunk/record.