Print

Print


What about using <unclear>, perhaps with some sensible value for
@reason? <unclear reason="ocr">?

Dot

On Mon, Jun 8, 2009 at 1:17 PM, stuart yeates<[log in to unmask]> wrote:
> We've got quite a bit of recently acquired text in Maori and there are a
> number of easily detectable errors (because we know more about the language
> than the encoders). The texts often include both English and Maori, and
> project work has the effect of making Maori text 'valuable' (i.e. it's used
> for linguistic analysis). Currently we mark apparent errors with:
>
> <foreign xml:lang="en">...</foreign>
>
> Which has the effect of removing the apparently erroneous fragment from
> linguistic analysis, because this is only done on xml:lang="mi" fragments.
> However, usually these are OCR errors in Maori rather than words/sentences
> in English.
>
> Is there a better tag for this?
>
> Properties that would be great would be (a) the ability to keep track of
> automatically inserted tags (so for example they could be removed prior to
> processing by an updated version of the script without inferring with
> manually inserted tags); (b) not privileging one language over another; (c)
> the ability to be added in a single pass over the text without the need to
> store the entire document in memory (i.e. no requirement for a list of tags
> in the header); and (d) preferably not using the 'n=""' attribute, which
> we're already using for too many different things in too many places.
>
> Once we have time to look at errors, we can either correct the actual text
> (for OCR errors) or use:
>
> <foreign xml:lang="en">...</foreign> (for English words)
>
> or
>
> <choice><sic>...</sic><reg>...</reg></choice> (for apparent mistakes in the
> orginial)
>
> cheers
> stuart
>
> --
> Stuart Yeates
> http://www.nzetc.org/       New Zealand Electronic Text Centre
> http://researcharchive.vuw.ac.nz/     Institutional Repository
>



-- 
*~*~*~*~*~*~*~*~*~*~*
Dot Porter (MA, MSLS)          Metadata Manager
Digital Humanities Observatory (RIA), Regus House, 28-32 Upper
Pembroke Street, Dublin 2, Ireland
-- A Project of the Royal Irish Academy --
Phone: +353 1 234 2444        Fax: +353 1 234 2400
http://dho.ie          Email: [log in to unmask]
*~*~*~*~*~*~*~*~*~*~*