> On Oct 26, 2018, at 5:54 AM, Wolfgang Meier <[log in to unmask]> wrote:
> Dear Martin,
>> We have run into problems with very large texts (two million words or more), where the text display is accompanied by an image that comes from an IIIF server at the Internet Archive. Hakluyt’s Voyages are a good example (A02495 in the TCP archive). Turning pages with an average text occurs roughly at the same pace as turning the pages of a book. With a very long text, the turning of pages may take ten seconds or more—an eternity if there a lot of pages to turn.
> Technically it should not make any difference if you have a large or a small text: eXist can directly jump to the fragment you want to display. It does not need to look at the rest of the document nor even load it, so its size should be irrelevant. If you store one large document or split it into smaller fragments should - by design - have no impact on query processing within eXist.
> The only guess I would have is that the implementation for extracting the page boundaries could be inefficient and walks into parts of the document it should not be bothered with. Now I know I have been changing that very implementation a number of times past year, so it’s definitely possible there’s a bottleneck in the version you use. But this can surely be fixed if I have a test case.
>> The most frustrating feature of eXist (or our not sufficiently savvy implementation of it) is what it does (or does not) do with <note> elements. If a <note> element is recorded inline in the source text, its display in the margin is instant. If you keep the notes as a separate text stream in a <back> element (for which there are good reasons) performance is so slow as to be useless.
> Likewise this is likely less an eXist problem as an issue in the implementation of the operation. Normally I would resolve references to notes in <back> using the id() function, which is the fastest index operation eXist has available. It should be instantaneous as it is a direct lookup in the index. May I suggest to send me a test case to the tei-publisher tracker since those are rather application problems than eXist issues?
Wolfgang, we discussed this on gitlab some time back:
At the time you agreed with me that the root of the problem is the implementation of wildcards with the preceding and following axes in eXist, where it iterates over every node in the document rather than incrementing or decrementing from a known starting position. A document with a million or two nodes reveals pathological performance that isn't really noticeable with smaller documents. There are at least two open issues in eXist's tracker that may be relevant:
I have a strong suspicion but no easy way to prove that another piece of the puzzle is an extra level of recursion during ODD application when processing a <w> inside a <note> that is remote from the <ref> pointing to it.
There have been other problems due to the slowness of the preceding and following axes but the note/ref combination with notes in the back matter was making it take 15 minutes to turn the page in a large document.
>> I am delighted to learn that Magadalena and Wolfgang were nominated for the Rahtz prize. Their work had its origin in work that Sebastian did in a project for which the Mellon Foundation, the TEI, Oxford, and Nebraska’s Center for Digital Research in the Humanities provided generous support. Magdalena and Wolfgang certainly ran fast and in the right direction with the ball that Sebastian threw them. And so did Joe Wicentowski, who was very quick to spot the practical virtues of Sebastian’s Processing Model.
> Development on TEI Publisher and the TEI Processing Model library has progressed quickly during the past year . There will be a lot of changes coming with 4.0, which I’m currently trying to finalize and document. It may be worthwhile to have a look and join our hip chat room , where we can help with questions.
That's good news. The project Martin M. and I are working on is currently using a fork of TEI Simple from a couple of years ago. I would like to switch to using tei-publisher-lib but I think that will be pretty major surgery and hasn't gotten very high on the to-do list yet. I also note you've switched to GPLv3 while we currently have a BSD license and since parts of the library end up embedded in the app, I think we will have to switch licenses, and that will be a hard sell with at least one of our contributors.
>  http://teipublisher.com
>  https://www.hipchat.com/gROkvVTMA
Craig A. Berry
"... getting out of a sonnet is much more
difficult than getting in."