Print

Print


Hi Ben,

Did you look at <zone>?  It’s designed to allow you to designate points on a surface.

All best,
—Matt



> On Jun 19, 2015, at 3:23 PM, Benjamin Kiessling <[log in to unmask]> wrote:
> 
> Hi,
> 
> I'm working on OCR of Latin and Greek texts and looking for a more
> flexible alternative to the common hOCR format. As our results get
> converted to TEI/Epidoc finally anyway (and OCR itself could be
> described as an epigraphic process) it would be somewhat fortuitous
> if information like bounding boxes for lines, words, and graphemes,
> recognition confidences, and script detection could be adequately
> represented using already defined TEI primitives. In addition,
> representing the output of multiple OCR engines including different
> segmentations (word boundaries, columns, ...) would be desirable.
> 
> I've had a look at the P5 guidelines but couldn't find any
> elements/attributes that could be utilized for these purposes without
> some extremely creative coercion. So I'm looking for input on how to
> achieve a non-contrived encoding of these features.
> 
> All Best,
> Ben