Print

Print


Dear Paolo,

Please have a look at the proposal addressing this at 
https://github.com/TEIC/TEI/issues/1670

It avoids the "POS-in-@ana" issue, and provides arguments for that. You 
will also see there a list of projects that use the proposed format, 
some of them based on MorphAdorner.

The practical question for you now, I guess, is either to keep the 
existing TEI skeleton and disobey the @ana datatype or adopt the changes 
we have suggested in the ticket and put the POS information where it 
belongs, hoping that the Council will address the issue before the end 
of the world. It's a gamble... :-)

Best wishes,

   Piotr


On 01/02/18 21:11, Paolo Monella wrote:
> Dear all,
>
> I ran a lemmatizer/PoS tagger (TreeTagger) on a TEI P5-encoded file 
> and want to encode the result in attributes of <w>.
>
> I searched the TEI-L archives and the Internet. I found that 
> MorphAdorner [1] uses @lemma for lemmata and @ana for the PoS output 
> (e.g. "adjective, positive genitive plural masculine"):
>
> <w lemma="in" ana="#p-acp" reg="in" xml:id="A88624-000740">in</w>
>
> I had tried this encoding:
>
> <w ana="4-S--------" lemma="in" n="in" xml:id="w315">in</w>
>
> The main difference is that MorphAdorner prepends a "#" to the value 
> of @ana because this value should be a teidata.pointer [2].
>
> In any case, also "#p-acp" is no valid pointer (no valid URI), so do 
> you think I should leave my encoding as it is, or prepend "#" as in 
> @ana="#4-S--------"?
>
> Thank you,
> Paolo
>
> [1] See paragraph "Simplified TEI P5-like output" in 
> http://morphadorner.northwestern.edu/morphadorner/documentation/xmloutput/
> [2] 
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.analytic.html
>