Print

Print


Thomas,

I'm not sure which solution you're referring to, but I want to point out 
that handling OCR text with coordinates is a feature that we were 
interested in incorporating into the Best Practices for TEI in 
Libraries.  Indeed, a colleague made a similar suggestion a few years ago:

https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/issues/27

... but as you can see we labeled this issue as "dormant" because we our 
colleague who suggested it never provided more details, and none of us 
working on the revising the BPTL felt knowledgeable enough to come up 
with an appropriate solution.

We welcome a proposal from you or others on how to handle this in the BPTL.

Kevin

On 11/24/17 5:11 AM, Staecker wrote:
> Dear Michael,
> not so much as a paragon of wisdom, but as a humble TEI encoder I feel a 
> little bit uneasy about such a solution as  you leave out the OCR 
> coordinates that enable you to link sections of the text to the image 
> that has been the basis for the OCR process. By the way, the same counts 
> for the solution the TEI libray SIG offered quite recently over this 
> list which I consider rather dissatifying in that respect. I'd rather 
> would suggest to use <sourcDoc> to accomodate and save OCRed texts 
> together with the coordinates. I'd be really curious to hear the 
> opinions of others about this.
> As a matter of course, the output of the OCR process, ideally in ALTO or 
> some other standardized format, has to be converted from this format to 
> a form compliant with <sourcDoc>. The original format could be added to 
> the TEI <xenodata> section.
> Best,
> Thomas
> 
> Am 20.11.2017 um 14:57 schrieb Michael.Dahnke:
>> Dear honourable paragon of wisdom,
>>
>> for »Narragonien«http://kallimachos.de/kallimachos/index.php/Narragonien  we
>> have digitized different versions of the so-called »Ship of fools«.
>> Currently, we have two versions of texts, first OCR and second an already
>> normalized version. Is there any common way of encoding so that connection
>> of both of them is evident? Our suggestion is following:
>>
>> <div rend="mainText">
>> 	<div type="normalized">
>> 		<p> Viel, viel sind meiner Tage
>> 			Durch Sünd entweiht gesunken hinab.
>>               O großer Richter, frage
>>               Nicht wie, o lasse ihr Grab
>>               Erbarmende Vergessenheit
>>               Laß, Vater der Barmherzigkeit,
>>               Das Blut des Sohns es decken. </p>
>> 	</div>
>>      <div type="OCR">
>> 		<p>Ach wenig sind der Tage
>>               Mit Frömmigkeit gekrönt entflohn,
>>               Sie sinds, mein Engel, trage
>>               Sie vor des Ewigen Thron,
>>               Laß schimmern die geringe Zahl,
>>               Daß einsten mich des Richters Wahl
>>               Zu seinen Frommen zähle.</p>
>> 	</div>
>> </div>
>>
>> We would be delighted about every suggestion.
>>
>> Thanks in advance,
>> Michael
>>
>>
>>
>> --
>> Sent from:http://tei-l.970651.n3.nabble.com/
> 
> -- 
> ***************************************
> Prof. Dr. Thomas Stäcker
> Direktor der
> Universitäts- und Landesbibliothek Darmstadt
> Magdalenenstr. 8
> 64289 Darmstadt
> +49 (0)6151 16-76200
> [log in to unmask]
>