Print

Print


On Thu, 8 Sep 1994 15:35:57 CDT David Megginson said:
>I am interested in whether I can use existing TEI facilities for
>writing about language.  I am currently using a two-part distinction:
>the <lma> tag marks a lemmatised headword, while the <frm> tag marks a
>specific written instance.  For example, if I were dealing with Early
>Modern English, <frm>shyppe</frm> and <frm>ship</frm> would both be
>forms of <lma>Ship</lma>.
 
I think the answer is, yes you can.
 
Both of these usages look (at first glance, anyway) like specializations
of the TEI element MENTIONED, which is explicitly intended for
metalinguistic discussion.  As defined by TEI P3, however, this element
has no TYPE attribute, so the very simplest way of handling your
distinction, namely to write
 
  In Early Modern English, <mentioned type='form'>shyppe</> and
  <mentioned type='form'>ship</> are both forms of <mentioned
  type='lemma'>Ship</>.
 
won't work, because TYPE is not declared as an attribute of MENTIONED.
 
(Perhaps the TYPE attribute ought to be made universal, so as to make
such specializations / subclass elements easier to handle in all cases?)
 
>The <lem> tag in TEI is available only in apparati critici, and the
>dictionary tags seem to want more structure (where I want to use the
>tags for phrasal elements in running prose).  Any suggestions?
 
Indeed -- apart from their etymology, the 'lemma' of a critical text
and the 'lemma' of lemmatization have very little in common.  (And
mathematicians, used to the term 'lemma' as meaning 'an auxiliary
proposition used in the proof of a theorem' have reported deep confusion
when reading both the text-critical and the dictionary chapters.)  So
don't use LEM.  The dictionary tags are closer to the semantics you
are aiming at, I think, but as you say they aren't well suited for
running text.
 
The simplest way to tag your words, I would suggest, might be one of
these:
 
1 select the additional tag set for analysis and interpretation, and
use the INTERP element to define what you mean by the distinction
--- perhaps something like:
 
  <interp id=lma resp='David Megginson'
          type='word form'
          value='dictionary form' >
  <interp id=frm resp='David Megginson'
          type='word form'
          value='attested (ms) form' >
  <!-- or perhaps value='oblique form' ? -->
 
These can go virtually anywhere (but you need the revised, fixed DTD;
in the first issue, these elements, like APP and others,
are unreachable from anywhere, even with analysis selected).
 
Each use of MENTIONED can now be labeled a lemma or an inflected form:
 
  In Early Modern English, <mentioned ana='frm'>shyppe</> and
  <mentioned ana='frm'>ship</> are both forms of <mentioned
  ana='lma'>Ship</>.
 
2 define two new elements, FRM and LMA (or INFLECTED and LEMMA if you
prefer clarity to brevity), identifying them as subclasses of
MENTIONED by using the TEIForm attribute.  In one file (call it
mytags.ent), put the declaration
 
  <!ENTITY % x.hqphrase 'inflected | lemma |' >
 
In another (call it mytags.dtd) declare the two elements, copying the
content model of MENTIONED:
 
  <!ELEMENT inflected - - (%phrase.seq) >
  <!ATTLIST inflected     %a.global;
            TEIform       CDATA         'mentioned'  >
  <!ELEMENT lemma     - - (%phrase.seq) >
  <!ATTLIST lemma         %a.global;
            TEIform       CDATA         'mentioned'  >
 
In the DTD subset of your document, declare these two files thus:
 
  <!ENTITY % TEI.extensions.ent SYSTEM 'mytags.ent' >
  <!ENTITY % TEI.extensions.dtd SYSTEM 'mytags.dtd' >
 
And you're done.
 
N.B. since the name FORM is taken, for the dictionary tag set's
'form group', it is not strictly conformant to reuse that name
(since it would introduce a name collision if your extensions were
to be used in connection with the dictionary tag set).
 
I hope this helps.
 
-C. M. Sperberg-McQueen
 ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago
 [log in to unmask] / u35395@uicvm