Print

Print


Perhaps this is of interest: https://bitbucket.org/fryske-akademy/tei-encoding (sometimes I have to refresh the page for layout) an odd customization to use universal dependencies based pos and features.

At the Fryske Akademy we are also working on a tagger using this customization, tagger not ready yet though.

Eduard Drenth, Software Architekt

[log in to unmask]

Doelestrjitte 8
8911 DX  Ljouwert
+31 58 234 30 47
+31 62 094 34 28 (privé)

Op freed bin ik frij
https://www.fryske-akademy.nl/~edrenth/
https://bitbucket.org/fryske-akademy/
https://workflow-fryske-akademy.atlassian.net/


gpg: https://sks-keyservers.net/pks/lookup?op=get&search=0x065EF82A1E02CC43

________________________________________
From: TEI (Text Encoding Initiative) public discussion list <[log in to unmask]> on behalf of Paolo Monella <[log in to unmask]>
Sent: Tuesday, January 2, 2018 9:11 PM
To: [log in to unmask]
Subject: PoS tagging in <w> with @ana: pointer?

Dear all,

I ran a lemmatizer/PoS tagger (TreeTagger) on a TEI P5-encoded file and
want to encode the result in attributes of <w>.

I searched the TEI-L archives and the Internet. I found that
MorphAdorner [1] uses @lemma for lemmata and @ana for the PoS output
(e.g. "adjective, positive genitive plural masculine"):

<w lemma="in" ana="#p-acp" reg="in" xml:id="A88624-000740">in</w>

I had tried this encoding:

<w ana="4-S--------" lemma="in" n="in" xml:id="w315">in</w>

The main difference is that MorphAdorner prepends a "#" to the value of
@ana because this value should be a teidata.pointer [2].

In any case, also "#p-acp" is no valid pointer (no valid URI), so do you
think I should leave my encoding as it is, or prepend "#" as in
@ana="#4-S--------"?

Thank you,
Paolo

[1] See paragraph "Simplified TEI P5-like output" in
http://morphadorner.northwestern.edu/morphadorner/documentation/xmloutput/
[2]
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.analytic.html