Hello Martin,

i agree with the others that MS Word might not be the best choice. Why not use a custom plugin for oXygen’s author via plugin-builder<>. Customising it for the projects needs (see here<>) was very quick and painless. In combination with Github we have thus created a dual stream proofreading environment. From zero to Github and xml took students hours instead of days.

You can see my instructions to the students (with just a  single proofing stream) here<>.

The good:
- Allow students familiar with xml to work in the environment of their choice (even WORD or EXCEL).
- Use oxygen author for UI only editing, while limiting the amount of *damage* that students can do.
- Let Github handle the highlighting of conflicts, and progress tracking.
- consistent markup
- very fast results

The bad
- not much
- very first round of pull requests are bit of a mess.
- installing software can be hard.
- about two full days to of prep work splitting xml into editable chunks, adding icons to the plugin and some css.
- i don’t like author mode


On 27. Apr 2017, at 01:09, Martin Mueller <[log in to unmask]<mailto:[log in to unmask]>> wrote:

I’m not sure whether this is a sensible question to ask, but I’ll ask it anyhow.

This summer we’ll have a number of undergraduate curators of TCP texts fixing this and that, mainly incompletely transcribed words, but sometimes longer stretches of text or whole pages.

So we’ll some transcription platform in addition to an eXist site at<>,  where single words can be fixed by changing the value of the content to of a <w> element, mercifully invisible to the user.

If you believe that the best tool is the tool you know best, you’d try to figure out whether undergraduates could do this work using Microsoft Word with a set of styles that subsequently support the automatic transformation of the Microsoft word passages into XML fragments that can be fitted into the TCP transcriptions.

Is that a plausible scenario and has something like that been done? TCP encoding is quite sparse. Text is either marked (inside <hi>) or unmarked, and the transcription is silent about what the unmarked state is.  My rough guess is that a dozen elements will cover the vast majority of cases.

The Folger Library has an attractive Web-based tool for manuscript transcription that can probably be adjusted with little trouble.

The students will be in residence for six weeks, and it may be that we should teach them encoding with oXygen. Some of them may love it, others may hate it.

I’d be grateful for advice and practical war stories about what does and does not work.