Print

Print


In a private note, Eric Peterson <[log in to unmask]> asked me some
clarification about the encoding of morphological information in dictionary
entries. I have the feeling that we did not explain this very well in the
Guidelines, and I am posting here (with Eric's authorization) a summary of
our discussion, which might be of interest to other people.
 
Eric was trying to find a legal way to note that an unusual <orth> form is
peculiar to a particular part of speech - but not to all inflections of
that POS, as in the following example (from a French-English dictionary):
 
________________________________________________________________________
bonhomme
  nm: (pl bonshommes) fellow
  a: good-natured;
________________________________________________________________________
 
 
The headword information ("bonhomme") applies to two senses, and the first
sense has an unusual plural form ("bonshommes"). The following encoding,
suggested as a possibility by Eric, reflects correctly the structure of the
entry:
 
  <entry>
 
    <form>
      <orth>bonhomme</orth>
    </form>
 
    <sense>
      <gramGrp>
        <pos>n</pos>
        <gen>m</gen>
      </gramGrp>
      <form type=infl>
         <number>pl</number>
         <orth>bonshommes</orth>
      </form>
      <trans>
        <tr>fellow</tr>
      </trans>
    </sense>
 
    <sense>
      <gramgrp>
        <pos>a</pos>
      </gramgrp>
      <trans>
        <tr>good-natured</tr>
      </trans>
    </sense>
 
  </entry>
 
However, Eric said that this encoding seems to suggest that the whole noun
sense must be plural. This is the part that we did not explain very well in
P3.
 
You probably noticed that both <form> and <gramGrp> can contain the
morphological tags <gen>, <number>, etc.. However,
 
- the <gramGrp> tag gives grammatical info about the sense in which it appears
  (or the whole entry if it appear at the top level);
 
- the <form> tag gives information only about one or several forms for a given
  sense (or the whole entry), and the morphological tags that appear in it give
  morphological information only about that or those forms only.
 
If I wanted to represent the fact that the first sense is only plural
(which is not the case in the example), I would do:
 
  <entry>
 
    <form>
      <orth>bonhomme</orth>
    </form>
 
    <sense>
      <gramGrp>
        <pos>n</pos>
        <gen>m</gen>
        <number>pl</number>
      </gramGrp>
      <form>
         <orth>bonshommes</orth>
      </form>
      <trans>
        <tr>fellow</tr>
      </trans>
    </sense>
 
    <sense>
      <gramGrp>
        <pos>a</pos>
      </gramGrp>
      <trans>
        <tr>good natured</tr>
      </trans>
    </sense>
 
  </entry>
 
 
Jean