Dear Piotr,

Thank you for your answer. We're thinking of your 'kludgy' option, I think it suits our needs.

<w ana="#th螵1_thet_1">tha</w>

referring to an interpretation group:

<interpGrp xml:id="th螵1_thet_1">
 <interp type="lemma">th螵1</interp>
 <interp type="lemma">thet_1</interp>

(The "_1" is to distinguish between different lemmata in our online Old Frisian dictionary.)

Best wishes,

Van: Piotr Banski <[log in to unmask]>
Verzonden: woensdag 23 november 2016 14:48
Aan: Levi Damsma; [log in to unmask]
Onderwerp: Re: tagging multiple lemmas to ambiguous words

Dear Levi,

I wonder how far in the process you are and how open to making your
markup a bit more complex (the alternative being to keep it simple and
just kludge the more complex cases).

And re(re,re)reading your message, I am actually not sure how detailed
you want to become and what your initial assumptions are, because it
seems that at the beginning you are talking about two lemmas of the noun
and then shift the focus to two lemmas of the article (my conjecture
being that each article lemma defines its own paradigm depending on its

I assume you do something like

<w lemma="thi" type="art" subtype="dat.masc">tha</w>

and would like to be able to signal the possibility of

<w lemma="thet" type="art" subtype="dat.neut">tha</w>

(and at this point, let me state that this alone seems water on the mill
for the suggestion of rationalizing simple w-level linguistic markup,
voiced recently on this list)

So you have two distinct ordered sets of values for a single word-sized
piece of text that you would like to express together. The way out that
I see is either to keep to the simple version and get a bit kludgy for
the complex cases by doing:

<w ana="#thi_thet">tha</w>

where "#thi_thet" identifies a place in the document where you list the
relevant feature complexes, and your processor knows that when it sees
the @ana attribute, it should do some special magic. Two remarks now:
1. that "place in the document" can be under <standoff> (a sibling of
<text>, approved by the Council long ago but still absent from the
online documentation)
2. a less kludgy version of the above would involve using @ana across
the board, on all <w> elements.

Or you get more complex by invoking ISO MAF (Morpho-syntactic annotation
framework)... [1][2]


... and mapping it to TEI in some clever way. A clever way could again
involve the approved but still unofficial <standoff> element in the same
document, or a series of documents the way that e.g. the National Corpus
of Polish [3] did.


Below, I paste a fragment of the file (warning: large!) that you can find at

In the partially indented fragment below (I suggest pasting it into an
XML editor for highlighting), the "interps" fragment lists all possible
interpretations of the string "m這dzi", while the "disamb" fragment
presents a result of automatic disambiguation.
"base" stands for lemma, "ctag" for part-of-speech, and "msd" for
morpho-syntactic description (we used the CES names for sentimental

<seg corresp="ann_segmentation.xml#segm_1.2-seg" xml:id="morph_1.2-seg">
<fs type="morph">
    <f name="orth"><string>m這dzi</string></f><!-- m這dzi [5,6] -->
<f name="interps">
<fs type="lex" xml:id="morph_1.2.1-lex">
<f name="base"><string>m這dy</string></f>
<f name="ctag"><symbol value="adj"/></f>
<f name="msd">
<symbol value="pl:nom:m1:pos" xml:id="morph_1.2.1.1-msd"/>
<symbol value="pl:voc:m1:pos" xml:id="morph_1.2.1.2-msd"/>
<fs type="lex" xml:id="morph_1.2.2-lex">
<f name="base"><string>m這dy</string></f>
<f name="ctag"><symbol value="subst"/></f>
<f name="msd">
<symbol value="pl:nom:m1" xml:id="morph_1.2.2.1-msd"/>
<symbol value="pl:voc:m1" xml:id="morph_1.2.2.2-msd"/>
<fs type="lex" xml:id="morph_1.2.3-lex"><f
name="base"><string>m這dzi</string></f><f name="ctag"><symbol
value="subst"/></f><f name="msd"><vAlt><symbol value="pl:nom:m1"
xml:id="morph_1.2.3.1-msd"/><symbol value="pl:voc:m1"
xml:id="morph_1.2.3.2-msd"/></vAlt></f></fs><fs type="lex"
xml:id="morph_1.2.4-lex"><f name="base"><string>m這dzie</string></f><f
name="ctag"><symbol value="subst"/></f><f name="msd"><symbol
value="pl:gen:n" xml:id="morph_1.2.4.1-msd"/></f></fs><fs type="lex"
xml:id="morph_1.2.5-lex"><f name="base"><string>m堯d</string></f><f
name="ctag"><symbol value="subst"/></f><f name="msd"><vAlt><symbol
value="sg:gen:f" xml:id="morph_1.2.5.1-msd"/><symbol value="sg:dat:f"
xml:id="morph_1.2.5.2-msd"/><symbol value="sg:loc:f"
xml:id="morph_1.2.5.3-msd"/><symbol value="sg:voc:f"
xml:id="morph_1.2.5.4-msd"/><symbol value="pl:gen:f"

<f name="disamb">
<fs feats="#an8003" type="tool_report">
<f fVal="#morph_1.2.1.1-msd" name="choice"/>
<f name="interpretation">
<!-- interpretation -->


I am hopeful that some middle-ground examples from others on this list
are forthcoming.

HTH and best regards,


On 23/11/16 09:35, Levi Damsma wrote:
> Dear everyone,
> I am currently lemmatising and POS-tagging an Old Frisian text (the elder 'Skeltanariucht' from MS Junius 49) for the Frisian Academy (Fryske Akademy), and am wondering how to cope with ambiguity: specifically, a word which could be lemmatised with two different lemma's according to how one interprets it.
> An example: "tha banne" ('the ban/the summons') could be the dative case of a masculine "thi ban" or of a neuter "thet ban". This  word, "bon", appears as both masculine and neuter elsewhere in this text, so it is not possible to determine wether I have to tag the article in this example with the lemma "thet" or "thi". Ideally, in our online edition of this text, I want this word to link to both lemmata.
> First I was thinking of something like <choice></choice> (which I now use for corrections with <sic> and <corr>), but maybe this is not what I want, because I want to show both lemmata, not switch views between them. I'd rather just find a way to add two lemmata in one <w>/word.
> I do not have much experience with TEI, so maybe I am overlooking a very simple solution.
> Does anybody have a good solution, or maybe just some thoughts which could point me in the right direction?
> Thanks!
> Levi

Piotr Ba雟ki, Ph.D.
Senior Researcher,
Institut fr Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany