Print

Print


I might be missing something, but isn't this what <facsimile> is for? 
Check out the examples in Chapter 11.1:

<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHFAX>

especially Figure 11.2 and the associated encoding.

Cheers,
Martin

On 15-06-19 01:23 PM, Benjamin Kiessling wrote:
> Hi,
>
> I'm working on OCR of Latin and Greek texts and looking for a more
> flexible alternative to the common hOCR format. As our results get
> converted to TEI/Epidoc finally anyway (and OCR itself could be
> described as an epigraphic process) it would be somewhat fortuitous
> if information like bounding boxes for lines, words, and graphemes,
> recognition confidences, and script detection could be adequately
> represented using already defined TEI primitives. In addition,
> representing the output of multiple OCR engines including different
> segmentations (word boundaries, columns, ...) would be desirable.
>
> I've had a look at the P5 guidelines but couldn't find any
> elements/attributes that could be utilized for these purposes without
> some extremely creative coercion. So I'm looking for input on how to
> achieve a non-contrived encoding of these features.
>
> All Best,
> Ben
>