It seems to me that what youíre suggesting is basically a multi-step transactional process. Basically, you want to parse out a section of the XML, make it visible to the user (via html, I assume?), have the user make necessary corrections (or at least note what the issues are) and then merge the result back into the XML. From an editorial standpoint, someone needs to sign off on the proposed correction before that merge happens.
Iím wondering if the solution to that isnít to have a database of tei snippets and proposed corrections. When an editor signs off, that proposed correction could then be rendered into well-formed P5 XML and merged back into the parent document.
In my view splitting the section off to be viewed isnít that hard ó itíd be a simple xslt operation óthe difficulty lies more in the clean merge back in, which Sebastian is right to be concerned about. Having a middle step with a person who has to approve those changes might help make that easier logistically.
> On Feb 22, 2015, at 1:48 PM, Martin Mueller <[log in to unmask]> wrote:
> I may be over-problematizing this, but the main use case for this approach
> is not corrections in the XML structure but corrections of words.In my
> ideal, and perhaps Utopian' environment curators "see" a page, but behind
> that page is a tokenized and linguistically annotated text.If you click on
> a word or phrase it opens a pop-up window with metadata and some form that
> lets you enter a textual correction, which goes through review stages
> before being integrated into the text. This is a page-oriented version of
> the much more primitive AnnoLex tool (http://annolex.at.northwestern.edu).
> Some common encoding errors in the TCP texts involve just renaming an
> element. They may be discovered by readers who don't know or care about
> TEI but know that this line is or is not verse. I assume or hope, perhaps
> vainly, that a lot of textual will take the form of "curation en passant"
> where a reader comes across something and suggests a fix right then and
> there. Doing that kind of work should be easier than ordering a book from
> Amazon. I'm not sure we'll ever have enough resources to build the
> sophisticated and robust infrastructure for that to happen so that many
> hands will indeed make light work.
> I note your concern that the re-integration of page fragments may be
> difficult. That may be where this project collapses. But the eXist
> function mentioned by Jens Petersen in his response looks interesting.
> That said, I must admit that while I think I have a pretty good
> understanding of how one might design an environment that lets users spot
> errors in words and suggest corrections, I have a much shakier grasp of
> how to deal with encoding problems. If it's just the name of an element,
> it simple: it just means changing the spelling of something. But what if a
> reader correctly notes that "this 'Con.' in italics is not part of the
> line. It is a verse medial change of speakers". Fixing that is a
> multi-step procedure. Perhaps one can't do better than have a really
> simple "report an error" procedure.
> Errors of that type are quite in the TCP texts, and their discovery is
> well within the competence of many readers. They will be discovered
> hundreds and thousands of times by readers. But instead of being annoyed
> about the lamentable quality of the corpus, they should feel motivated to
> do something about it, and the doing something about it should be very
> Martin Mueller
> Professor emeritus of English and Classics
> Northwestern University
> On 2/22/15 11:46 AM, "Sebastian Rahtz" <[log in to unmask]> wrote:
>> I am a little worried about this. Do you have practical evidence, Martin,
>> that people
>> who are willing to correct the XML files will only do so if they can work
>> fragments of the form <PAGE>ä..</PAGE>? I am sure a system _could_ be set
>> to extract the fragments into well-formed expanded XML, and then put back
>> the originals, but checking the put-back hasnĻt corrupted the non-page
>> seems quite problematic.
>> My feeling is that anyone capable of editing a TEI XML file at all is also
>> capable of finding the right <pb/> for the facsimile image they are
>> staring at,
>> and editing the right XML.
>> May I, with respect, suggest that you are over-problematizing this?
>> Sebastian Rahtz
>> Chief Data Architect
>> University of Oxford IT Services
>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431