Perhaps this is of interest: (sometimes I have to refresh the page for layout) an odd customization to use universal dependencies based pos and features.

At the Fryske Akademy we are also working on a tagger using this customization, tagger not ready yet though.

Eduard Drenth, Software Architekt

[log in to unmask]

Doelestrjitte 8
8911 DX  Ljouwert
+31 58 234 30 47
+31 62 094 34 28 (privé)

Op freed bin ik frij


From: TEI (Text Encoding Initiative) public discussion list <[log in to unmask]> on behalf of Paolo Monella <[log in to unmask]>
Sent: Tuesday, January 2, 2018 9:11 PM
To: [log in to unmask]
Subject: PoS tagging in <w> with @ana: pointer?

Dear all,

I ran a lemmatizer/PoS tagger (TreeTagger) on a TEI P5-encoded file and
want to encode the result in attributes of <w>.

I searched the TEI-L archives and the Internet. I found that
MorphAdorner [1] uses @lemma for lemmata and @ana for the PoS output
(e.g. "adjective, positive genitive plural masculine"):

<w lemma="in" ana="#p-acp" reg="in" xml:id="A88624-000740">in</w>

I had tried this encoding:

<w ana="4-S--------" lemma="in" n="in" xml:id="w315">in</w>

The main difference is that MorphAdorner prepends a "#" to the value of
@ana because this value should be a teidata.pointer [2].

In any case, also "#p-acp" is no valid pointer (no valid URI), so do you
think I should leave my encoding as it is, or prepend "#" as in

Thank you,

[1] See paragraph "Simplified TEI P5-like output" in