I’m not sure whether this is a sensible question to ask, but I’ll ask it anyhow.


This summer we’ll have a number of undergraduate curators of TCP texts fixing this and that, mainly incompletely transcribed words, but sometimes longer stretches of text or whole pages.


So we’ll some transcription platform in addition to an eXist site at http://shc.earlprint.org,  where single words can be fixed by changing the value of the content to of a <w> element, mercifully invisible to the user.


If you believe that the best tool is the tool you know best, you’d try to figure out whether undergraduates could do this work using Microsoft Word with a set of styles that subsequently support the automatic transformation of the Microsoft word passages into XML fragments that can be fitted into the TCP transcriptions.


Is that a plausible scenario and has something like that been done? TCP encoding is quite sparse. Text is either marked (inside <hi>) or unmarked, and the transcription is silent about what the unmarked state is.  My rough guess is that a dozen elements will cover the vast majority of cases.


The Folger Library has an attractive Web-based tool for manuscript transcription that can probably be adjusted with little trouble.


The students will be in residence for six weeks, and it may be that we should teach them encoding with oXygen. Some of them may love it, others may hate it.


I’d be grateful for advice and practical war stories about what does and does not work.