Print

Print


Hi all,
In http://jtei.revues.org/540, Werner Wegstein and I suggested something like the following for a similar case:
<form type="lemma">
 <gramGrp>
   <gen norm="feminine">die</gen>
 </gramGrp>
 <orth>Katze</orth>
</form>

As to the coma, you can have an additional <pc>, </pc>, which can occur in <gramGrp> or <form>.
@Piotr: these are cases where <gramGrp> in <form> would make more sense, would not they? Maybe we should record such configurations in TEI-Lex0
Laurent

Le 11 mars 2017 à 00:56, Piotr Bański <[log in to unmask]> a écrit :

Hello Jonathan,

in the TEI-Lex0 taskforce that is in the process of formulating streamlined baseline recommendations for dictionary encoding, we have so far arrived at something like the following, for your case:

<entry [attributes, among them @xml:id, and xml:lang here or on the <text> element above]>
 <form type="lemma"><orth>Ἀαρών</orth></form>
 <gramGrp><pos>ὁ</pos></gramGrp>
 <sense>...
...

I have treated ὁ above as a symbol, rather than an orthographic form, because this is the role that it plays in the entry. Variations depend on the entire system that you assume; for example, you could do:

<pos ana="#gender_m">ὁ</pos>

if you used a separately described taxonomy of grammatical features.

Another question is how badly you need the comma in the visualisation of your dictionary. It could be added by means of styling, on the way to the display.

HTH,

 Piotr

On 03/11/17 00:13, Jonathan Robie wrote:
I have several lexicons in which <orth> contains more than the lexeme
itself, e.g.

        <orth>Ἀαρών, ὁ</orth>


The lexeme is Ἀαρών, it is masculine so it takes the article ὁ.  Is that
the right way to use <orth>?  It seems to conflate two concerns.  What
is the best way to encode this information?


Thanks!


Jonathan


--
Piotr Bański, Ph.D.
Senior Researcher,
Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany

Laurent Romary
Inria, team Alpage
[log in to unmask]