I might be missing something, but isn't this what <facsimile> is for?
Check out the examples in Chapter 11.1:
especially Figure 11.2 and the associated encoding.
On 15-06-19 01:23 PM, Benjamin Kiessling wrote:
> I'm working on OCR of Latin and Greek texts and looking for a more
> flexible alternative to the common hOCR format. As our results get
> converted to TEI/Epidoc finally anyway (and OCR itself could be
> described as an epigraphic process) it would be somewhat fortuitous
> if information like bounding boxes for lines, words, and graphemes,
> recognition confidences, and script detection could be adequately
> represented using already defined TEI primitives. In addition,
> representing the output of multiple OCR engines including different
> segmentations (word boundaries, columns, ...) would be desirable.
> I've had a look at the P5 guidelines but couldn't find any
> elements/attributes that could be utilized for these purposes without
> some extremely creative coercion. So I'm looking for input on how to
> achieve a non-contrived encoding of these features.
> All Best,