Print

Print


+1

And you could also use the new <certainty> element to indicate how seriously one
should take the content of these elements....




In message <[log in to unmask]> Dot Porter
<[log in to unmask]> writes:
> What about using <unclear>, perhaps with some sensible value for
> @reason? <unclear reason="ocr">?
> 
> Dot
> 
> On Mon, Jun 8, 2009 at 1:17 PM, stuart yeates<[log in to unmask]> wrote:
> > We've got quite a bit of recently acquired text in Maori and there are a
> > number of easily detectable errors (because we know more about the language
> > than the encoders). The texts often include both English and Maori, and
> > project work has the effect of making Maori text 'valuable' (i.e. it's used
> > for linguistic analysis). Currently we mark apparent errors with:
> >
> > <foreign xml:lang="en">...</foreign>
> >
> > Which has the effect of removing the apparently erroneous fragment from
> > linguistic analysis, because this is only done on xml:lang="mi" fragments.
> > However, usually these are OCR errors in Maori rather than words/sentences
> > in English.
> >
> > Is there a better tag for this?
> >
> > Properties that would be great would be (a) the ability to keep track of
> > automatically inserted tags (so for example they could be removed prior to
> > processing by an updated version of the script without inferring with
> > manually inserted tags); (b) not privileging one language over another; (c)
> > the ability to be added in a single pass over the text without the need to
> > store the entire document in memory (i.e. no requirement for a list of tags
> > in the header); and (d) preferably not using the 'n=""' attribute, which
> > we're already using for too many different things in too many places.
> >
> > Once we have time to look at errors, we can either correct the actual text
> > (for OCR errors) or use:
> >
> > <foreign xml:lang="en">...</foreign> (for English words)
> >
> > or
> >
> > <choice><sic>...</sic><reg>...</reg></choice> (for apparent mistakes in the
> > orginial)
> >
> > cheers
> > stuart
> >
> > --
> > Stuart Yeates
> > http://www.nzetc.org/       New Zealand Electronic Text Centre
> > http://researcharchive.vuw.ac.nz/     Institutional Repository
> >
> 
> 
> 
> -- 
> *~*~*~*~*~*~*~*~*~*~*
> Dot Porter (MA, MSLS)          Metadata Manager
> Digital Humanities Observatory (RIA), Regus House, 28-32 Upper
> Pembroke Street, Dublin 2, Ireland
> -- A Project of the Royal Irish Academy --
> Phone: +353 1 234 2444        Fax: +353 1 234 2400
> http://dho.ie          Email: [log in to unmask]
> *~*~*~*~*~*~*~*~*~*~*