I'm working on a report about retrodigitization of written English before 1800 for my department and need to say something about manuscripts, which I know virtually nothing about. But here are some things I'd like to say, and I'll be grateful if folks can tell me whether I'm on the right track or not.
1. The retrodigitization of early modern print texts in English rests crucially on at least a century of work that starts with the bibliographical capture of items by Pollard and his successors, continues with the microfilming of stuff by libraries over the course of the 20th century, the digitization of those page images, and ends, for the time being, with the transcription of sizable chunks of those page images.
2. There is nothing like that in the world of medieval manuscripts. There is no Zentralkatalog or anything approaching it. It's more like sheet music: catalogs of this and that in this or that institution or location.
3. There never will be OCR for medieval manuscripts. On the other hand, you can probably segment most page images of medieval manuscripts into its distinct lines. Thus the universe of medieval manuscripts could theoretically be imagined as distinct digital "line objects," each of which could have a unique identifier and each of which could be associated with text boxes where one or more transcribers could enter a transcription of a line (or more likely, sequence of lines in a sequence of line objects). Thus transcription could theoretically proceed as a global crowdsourcing project of medievalists wherever -- the work of many hands, one line or a few lines at a time.
Are there any projects that work along those lines?
|