On Fri, Oct 26, 2001 at 11:27:28AM +0100, Michael Beddow wrote:
> In general, [rekeying] makes a lot of sense, but not so much in the case of
> scholarly lexicography. There, what's encoded in the Word file (or the
> typesetter tape) is often the result of countless editorial iterations
> and repeated expert checking. To reykey (or OCR and postedit) is in
> effect to put all that scholarly work at risk, or at best require it to
> be done all over again.
I have had a lot of luck with the strategy of having three separate
groups rekey, and then compare and examine differences by hand.
This can work out at well under US$1/page, too.
It helps if at least one group does not have English (for an English text)
as a first language, as they they make different sorts of errors.
The problems with converting from Word automatically can often include
missing or corrupted text, so you still need to do careful error checking.
The best conversion software is fairly good these days, for MS Word;
for things like PENTA typesetting tapes it can be harder.
I do agree with Michael Beddow that any additional checking is a great
benefit, and in general every extra thing you think of to test for will
elicit new errors!
Liam Quin - XML Core staff contact, W3C, http://www.w3.org/People/Quin/
Ankh: irc.sorcery.net www.valinor.sorcery.net irc.gnome.org www.advogato.org
Author, Open Source XML Database Toolkit, Wiley August 2000
Co-author: The XML Specification Guide, Wiley 1999; Mastering XML, Sybex 2001