Dear Eduard and Piotr,
thank you for your insights. I do hope that the proposal of the LingSIG
 is accepted. If useful, you might mention my own Ursus project 
as a use case, but I am sure that there are plenty of already existing
I am currently encoding as follows:
<w ana="4-S--------" lemma="in" n="in" xml:id="w315">in</w>
so I am not prepending a "#" to "4-S--------". It would take only a
little VI find/replace to prepend the "#", and minor changes in the JS
and Python scripts to make them process it (by removing it).
But I am reluctant to do so because I agree with the argument in the
ticket that it is a kludge.
No lint or parser gave me a failed validation because of this.
Do you still suggest that I prepend the "#"?
 Ticket https://github.com/TEIC/TEI/issues/1670
Il 05/01/2018 17:41, Piotr Bański ha scritto:
> Dear Paolo,
> One more question/nitpick. You say:
> > "#p-acp" is no valid pointer (no valid URI)
> Well, it is not, but it's a valid fragment identifier (see ), and
> somewhere in the maze of W3C specs, there is a statement on interpreting
> bare fragment identifiers as being virtually appended to the URI of the
> current document, yielding a correct (longer) URI. So I think that you
> are fine, syntactically (or have you actually got a failed validation
> result? I'd be very curious to see a test case then), but obviously not
> semantically (we address this "pretend that POS values are fragIDs, just
> for the sake of the tei.pointer datatype" issue in the text of the
> github ticket to which I pointed you, alongside other arguments against
> using @ana for this purpose).
> Best regards,
> : https://tools.ietf.org/html/rfc3986#appendix-A
> On 01/05/18 16:39, Piotr Bański wrote:
>> Dear Paolo,
>> Please have a look at the proposal addressing this at
>> It avoids the "POS-in-@ana" issue, and provides arguments for that.
>> You will also see there a list of projects that use the proposed
>> format, some of them based on MorphAdorner.
>> The practical question for you now, I guess, is either to keep the
>> existing TEI skeleton and disobey the @ana datatype or adopt the
>> changes we have suggested in the ticket and put the POS information
>> where it belongs, hoping that the Council will address the issue
>> before the end of the world. It's a gamble... :-)
>> Best wishes,
>> On 01/02/18 21:11, Paolo Monella wrote:
>>> Dear all,
>>> I ran a lemmatizer/PoS tagger (TreeTagger) on a TEI P5-encoded file
>>> and want to encode the result in attributes of <w>.
>>> I searched the TEI-L archives and the Internet. I found that
>>> MorphAdorner  uses @lemma for lemmata and @ana for the PoS output
>>> (e.g. "adjective, positive genitive plural masculine"):
>>> <w lemma="in" ana="#p-acp" reg="in" xml:id="A88624-000740">in</w>
>>> I had tried this encoding:
>>> <w ana="4-S--------" lemma="in" n="in" xml:id="w315">in</w>
>>> The main difference is that MorphAdorner prepends a "#" to the value
>>> of @ana because this value should be a teidata.pointer .
>>> In any case, also "#p-acp" is no valid pointer (no valid URI), so do
>>> you think I should leave my encoding as it is, or prepend "#" as in
>>> Thank you,
>>>  See paragraph "Simplified TEI P5-like output" in