Print

Print


<!--This is a revised version of the file I sent out yesterday on the
    TEI-L file server.  Like the other post-Myrdal documents, this was
    to have been completed by November 30.  My guilt feelings led me to
    send it out before it was really ready.  My apologies especially to
    those of you who are not deeply concerned one way or the other about
    the fine details of feature structure definitions.  Gary Simons,
    Steven Zepp and I are still working on the DTDs for feature
    structures and for feature structure declarations.  Your comments
    are more than welcome!                                           -->
<tei.1>
<tei.header>
<file.description>
<title.statement>
<title>The logic of feature structures</title>
<statement.of.responsibility>
<role>author</role><name>D. Terence Langendoen</name>
</statement.of.responsibility>
<publication.statement>
Version for circulation on TEI-L file server on December 11, 1991.
</publication.statement>
</file.description>
<revision.history>
<change.note>
<who>DTL</who>
<date>9-10 December 1991</date>
<what>wrote document</what>
<rev.number>0</rev.number>
</change.note>
<change.note>
<who>DTL</who>
<date>11 December 1991</date>
<what><list>
<item>some typos corrected and other minor editorial cleanup
<item>added descriptive material on <term>cmp</term attribute
<item>deleted commentary on <tag>all cmp</tag> and <tag>some cmp</tag>
<item>changed analysis of reduction of <tag>fs cmp</tag>
<item>added comments at top of document
</list></what>
<rev.number>1</rev.number>
</change.note>
</revision.history>
</tei.header>
<text>
<body>
<div1 name=section>
<head>
Logic of feature structures
<div2 name=subsection>
<head>
Compound forms of <tag>fs</tag>, <tag>f</tag> and <tag>atm</tag>
<p>
Steven Zepp and I came up with a set of conjunctive (grouping) and
disjunctive (alternation) tags at the feature-structure (<tag>fs</tag>),
feature (<tag>f</tag>), and atom (<tag>atm</tag>) levels, for which we
suggest the following tag set.
<list>
<item>fs.grp
<item>fs.alt
<item>f.grp
<item>f.alt
<item>atm.grp
<item>atm.alt
</list>
<p>
The <q>grp</q> suffix is on analogy with the suffix in
<tag>corresp.grp</tag>. It is neutral between <q>list</q>, <q>set</q>
and <q>and</q>, and thus seems appropriate if we have only one grouping
set of tags instead of three, as in the original <citn>TEI P1</citn>
recommendations (namely <tag>f.s.and</tag>, <tag>f.list</tag>, and
<tag>f.set</tag>).  The <q>alt</q> suffix is my idea.  Steven and I were
using <q>or</q> in our discussions, but I think <q>alt</q> is better as
it suggests both <q>alternation</q> and <q>alternatives</q>.
<p>
The content models for each pair of terms is the same, so that we could,
for economy and elegance, use the parameter entities
<term>%fs.cmpd;</term>, <term>%f.cmpd;</term> and
<term>%atm.cmpd;</term> for these pairs.  To cover the tagset at each
level, we could use the parameter entities <term>%fs.family;</term>
(covering <tag>fs.grp</tag>, <tag>fs.alt</tag> and <tag>fs</tag>),
<term>%f.family;</term> (covering <tag>f.grp</tag>, <tag>f.alt</tag> and
<tag>f</tag>), and <term>%atm.family;</term> (covering
<tag>atm.grp</tag>, <tag>atm.alt</tag> and <tag>atm</tag>).
<p>
In the case of <tag>fs.grp</tag> and <tag>fs.alt</tag>, the content is
two or more occurrences of any of the following tags.
<list>
<item>fs.grp
<item>fs.alt
<item>fs
<item>xref
<note id=limval place=inline>
The <tag>xref</tag> should point to an <tag>fs.grp</tag>,
<tag>fs.alt</tag>, or <tag>fs</tag>.  The possibility of <tag>xref</tag>
limits SGML validation, see section <xref target=lib>.
</note>
</list>
<p>
The content of <tag>f.grp</tag> and <tag>f.alt</tag> is analogous to that
of <tag>fs.grp</tag>, consisting of two or more occurrences of any of the
following tags.
<list>
<item>f.grp
<item>f.alt
<item>f
<item>xref
<note place=inline>
The <tag>xref</tag> should point to an <tag>f.grp</tag>,
<tag>f.alt</tag>, or <tag>f</tag>; see note <xref target=limval>.
</note>
</list>
<p>
The content models for the four compound tags all provide for unbounded
nesting.  However, there is no need for subgrouping and subalternation
at the atomic level, nor do we anticipate any need for pointers at this
level.  Hence the content model for <tag>atm.grp</tag> and
<tag>atm.alt</tag> can simply be two or more occurrences of
<tag>atm</tag>.
<div2>
<head>
Noncompound forms of <tag>fs</tag>, <tag>f</tag> and <tag>atm</tag>
<p>
The content model for <tag>fs</tag> should be one or more occurrences of
the following tags.
<list>
<item>f.grp
<item>f.alt
<item>f
<item>xref
<note place=inline>
The <tag>xref</tag> should point to an <tag>f.grp</tag>,
<tag>f.alt</tag>, or <tag>f</tag>; see also <xref target=limval>.
</note>
</list>
<p>
The content model for <tag>f</tag> should be one or more occurrences of
the following tags.
<list>
<item>fs.grp
<item>fs.alt
<item>fs
<item>xref
<note place=inline>
This tag points to <tag>fs.grp</tag>, <tag>fs.alt</tag>, or
<tag>fs</tag>; see also <xref target=limval>.
</note>
<item>atm.grp
<item>atm.alt
<item>atm
<item>plus
<item>minus
<item>any
<item>none
<note place=inline>
This tag replaces <tag>not.applicable</tag>.</note>
<item>default
<item>no.claim
<item>all
<item>some
</list>
<p>
Treating the <q>underspecification</q> values <q>any</q>, <q>none</q>,
<q>default</q>, and <q>no.claim</q> as tags was suggested in my posting
on lexical encoding.  I suggest adding <tag>all</tag> and
<tag>some</tag> for completeness.  <tag>all</tag> means that all legal
values are present; <tag>some</tag> indicates that some legal values are
present.  <tag>some</tag> differs from <tag>no.claim</tag> in that the
latter includes the possibility that no legal values are present, whereas
the former excludes that possibility.
<p>
Following a suggestion of Gary Simons, we may wish to add the following
feature-value tagset, which we can refer to as
<term>%unit.family;</term>.
<list>
<tag>unit.grp
<tag>unit.alt
<tag>unit
</list>
<p>
The distinction between <tag>atm</tag> and <tag>unit</tag> is that the
range of possible values for <tag>atm</tag> is assumed to be fixed,
though perhaps large, whereas the range of possible values for
<tag>unit</tag> is assumed not to be fixed.  If this idea is implemented,
the values for <tag>atm</tag> would be specified as values of an
attribute that we can call <term>value</term>.  The values of
<term>value</term> would be entered as CDATA.  On the other hand, the
values for <tag>unit</tag> would be specified as content and entered as
parsed character data (#PCDATA), just as the values for <tag>atm</tag>
are now specified.  In the following, I assume that this refinement is
not made, and that the <q>unit</q> tagset is not defined.
<div2>
<head>
Negation and its elimination
<p>
The recommendations in <citn>TEI P1</citn> included <tag>f.s.not</tag>
for expressing a negative feature value, which could be used recursively,
like <tag>f.s.or</tag> and <tag>f.s.and</tag>.  Steven and I concluded
that for <emph>representational</emph> purposes, negation need not be
recursive and could therefore be expressed as the value of an attribute
at the <tag>atm</tag> and <tag>fs</tag> levels (that is, all tags in the
<q>atm</q> and <q>fs</q> tagsets).  We propose to name the relevant
attribute <term>domain</term> and to permit its values to be
<term>self</term> and <term>cmp</term>, with the default value being
<term>self</term>.  The <term>cmp</term> value of the
<term>domain</term> attribute can be thought of as an instruction to
construct the <q>complement</q> (a kind of negation) of the value of the
<tag>atm</tag> or <tag>fs</tag> with which it is associated.
<div3 name=subsubsection>
<head>
Negation and its elimination from <tag>atm</tag> levels
<p>
Suppose that we wish to represent the lexical structure of a particular
word in a document as a member of the word-class noun and as not having
dative case.  Using the <term>domain</term> attribute on <tag>atm</tag>,
we could represent this structure as follows.
<xmp id=ex1><! [ CDATA [
<fs>
   <f name=word-class><atm>noun
   <f name=case><atm cmp>dative
</fs>
]]></xmp>
<p>
Let us assume that the feature-structure declaration lists the possible
values for the content of <tag>atm</tag> occurring as the content of an
<tag>f</tag> whose name is case which also occurs in an <tag>fs</tag>
with another <tag>f</tag> whose name is word-class and which contains an
<tag>atm</tag> whose content is noun, as follows.
<list>
<item>nominative
<item>genitive
<item>dative
<item>accusative
<item>instrumental
</list>
Then the structure in example <xref target=ex1> could be represented
without the use of the <term>cmp</tag> value of the <term>domain</term>
attribute as in example <xref target=ex2>; we say that the structure in
example <xref target=ex1> has been <emph>reduced</emph> to the structure
in example <xref target=ex2>.
<xmp id=ex2><! [ CDATA [
<fs>
   <f name=word-class><atm>noun
   <f name=case>
      <atm.alt>
         <atm>nominative
         <atm>genitive
         <atm>accusative
         <atm>instrumental
      </atm.alt>
</fs>
]]></xmp>
<p>
Under the same conditions, the structure in example <xref target=ex3>
can be reduced to the structure in example <xref target=ex4>.
<xmp id=ex3><! [ CDATA [
<fs>
   <f name=word-class><atm>noun
   <f name=case>
      <atm.alt cmp>
         <atm>nominative
         <atm>genitive
      </atm.alt>
</fs>
]]></xmp>
<xmp id=ex4><! [ CDATA [
<fs>
   <f name=word-class><atm>noun
   <f name=case>
      <atm.alt>
         <atm>dative
         <atm>accusative
         <atm>instrumental
      </atm.alt>
</fs>
]]></xmp>
<p>
The structures in examples <xref target=ex3> and <xref target=ex4> are
also equivalent to the structure in example <xref target=ex5>, though
the latter is not a <q>reduced</q> structure, since the <term>cmp</term>
value has not be eliminated.
<xmp id=ex5><! [ CDATA [
<fs>
   <f name=word-class><atm>noun
   <f name=case>
      <atm.grp>
         <atm cmp>nominative
         <atm cmp>genitive
      </atm.grp>
</fs>
]]></xmp>
<p>
Structures containing <tag>atm.grp cmp</tag>, such as in example
<xref target=ex6>, cannot in general be reduced.
<xmp id=ex6><! [ CDATA [
<fs>
   <f name=word-class><atm>noun
   <f name=case>
      <atm.grp cmp>
         <atm>nominative
         <atm>genitive
      </atm.grp>
</fs>
]]></xmp>
To say that a particular noun does not have both nominative and genitive
case is not to say that has any particular case or cases or alternation
or grouping of cases.
<p>
We assume that the <term>domain</term> attribute is also defined for the
other nonstructured feature-value tags, with the reductions given in
examples <xref target=ex7> through <xref target=ex13>.
<xmp id=ex7><! [ CDATA [ <minus cmp>     = <plus> ]]></xmp>
<xmp id=ex8><! [ CDATA [ <plus cmp>      = <minus> ]]></xmp>
<xmp id=ex9><! [ CDATA [ <any cmp>       = <none>  ]]></xmp>
<xmp id=ex10><! [ CDATA [ <none cmp>     = <any>   ]]></xmp>
<xmp id=ex11><! [ CDATA [ <no.claim cmp> = <no.claim> ]]></xmp>
<xmp id=ex12><! [ CDATA [ <default cmp>  = <atm>non-default-value-n
(n = 1) ]]></xmp>
<xmp id=ex13><! [ CDATA [
<default cmp> = <atm.alt>
                   <atm>non-default-value-1
                             ...
                   <atm>non-default-value-n
                </atm.alt>
(n > 1) ]]></xmp>
<p>
On the other hand, the nonstructured feature-value tags in examples
<xref target=ex14> and <xref target=ex15> have no easily computed
reduced versions, except in special cases.
<xmp id=ex14><! [ CDATA [ <all cmp>  ]]></xmp>
<xmp id=ex15><! [ CDATA [ <some cmp> ]]></xmp>
<div3>
<head>
Negation and its elimination from <tag>fs</tag> levels
<p>
Suppose that we wish to represent an agreement structure for a verb
structure as not having both third person and singular number
specifications.  This could be done by providing an <tag>fs cmp</tag>
as the value of an <tag>f</tag> inside a larger <tag>fs</tag>, as in
example <xref target=ex16>.
<xmp id=ex16><! [ CDATA [
<fs>
   <f name=word-class><atm>verb
   <f name=agreement>
      <fs cmp>
         <f name=person><atm>third
         <f name=number><atm>singular
      </fs>
</fs>
]]></xmp>
<p>
To reduce the structure in <xref target=ex16>, we first replace the
<tag>fs cmp</tag> by an <tag>fs.alt</tag> which encloses two
<tag>fs</tag>s.
<note place=inline>
The number of enclosed <tag>fs</tag>s is equal to the number of
<tag>f</tag>s enclosed by the original <tag>fs cmp</tag>; if that
tag encloses only one <tag>f</tag>, then it is simply replaced by a
<tag>fs</tag>.
</note>
In the first of these <tag>fs</tag>s, we replace the value of the first
<tag>f</tag> by its complement, and the value of the other <tag>f</tag>
by <tag>no.claim</tag>; in the second, we do the opposite.  The result
is shown in <xref target=ex17>.
<xmp id=ex17><! [ CDATA [
<fs>
   <f name=word-class><atm>verb
   <f name=agreement>
      <fs.alt>
         <fs>
            <f name=person><atm cmp>third
            <f name=number><no.claim>
         </fs>
         <fs>
            <f name=person><no.claim>
            <f name=number><atm cmp>singular
         </fs>
      </fs.alt>
</fs>
]]></xmp>
<p>
Second, we eliminate the <term>cmp</term> from the <tag>atm</tag>s in
accordance with the feature-structure declaration, and strengthen, if
possible, the <tag>no.claim</tag> values.  Let us assume that the FSD
specifies <q>first</q>, <q>second</q>, and <q>third</q> as the possible
values of <tag>f name=person</tag>; and <q>singular</q> and
<q>plural</q> as the possible values of <tag>f name=number</tag>.  Let
us also assume that all combinations of these feature-value pairs are
possible.  Then the representation in example <xref target=ex17> can be
reduced to that in example <xref target=ex18>.
<xmp id=ex18><! [ CDATA [
<fs>
   <f name=word-class><atm>verb
   <f name=agreement>
      <fs.alt>
         <fs>
            <f name=person>
               <atm.alt>
                  <atm>first
                  <atm>second
               </atm.alt>
            <f name=number><any>
         </fs>
         <fs>
            <f name=person><any>
            <f name=number><atm>plural
         </fs>
      </fs.alt>
</fs>
]]></xmp>
<p>
On the other hand, suppose that we wish to represent a lexical entry as
not being a third person, singular number verb.  This could be done by
replacing the containing <tag>fs</tag> in example <xref target=x16> by
<tag>fs cmp</tag> and replacing the contained <tag>fs cmp</tag> by
<tag>fs</tag>.  The resulting structure is shown in example <xref
target=ex19>.
<xmp id=ex19><! [ CDATA [
<fs cmp>
   <f name=word-class><atm>verb
   <f name=agreement>
      <fs>
         <f name=person><atm>third
         <f name=number><atm>singular
      </fs>
</fs>
]]></xmp>
<p>
On the assumption that the FSD specifies that <tag>f name=agreement</tag>
has values only in <tag>fs</tag>s which contain <tag>f
name=word-class</tag> whose value is <term>verb</term>, and assuming the
list of possible word-class (there called <term>category</term>) values
in <citn>TEI AI1 W9</citn>, then this structure could be reduced to the
structure shown in example <xref target=ex20>.
<xmp id=ex20><! [ CDATA [
<fs.alt>
   <fs>
      <f name=word-class>
         <atm.alt>
            <atm>adjective
            <atm>adverb
            <atm>article
            <atm>coordinator
            <atm>interjection
            <atm>noun
            <atm>particle
            <atm>preposition
            <atm>pronoun
            <atm>punctuation
            <atm>subordinator
         </atm.alt>
      <f name=agreement><none>
   </fs>
   <fs>
      <f name=word-class><atm>verb
      <f name=agreement>
         <fs.alt>
            <fs>
               <f name=person>
                  <atm.alt>
                     <atm>first
                     <atm>second
                  </atm.alt>
               <f name=number><any>
            </fs>
            <fs>
               <f name=person><any>
               <f name=number><atm>plural
            </fs>
         </fs.alt>
</fs>
]]></xmp>
<p>
Further, if <tag>fs cmp</tag> contains <tag>f.alt</tag>, the latter must
be converted to <tag>f.grp</tag>; conversely, if <tag>fs cmp</tag>
contains <tag>f.grp</tag>, the latter must be converted to
<tag>f.alt</tag>.
<p>
Finally, <tag>fs.grp cmp</tag> and <tag>fs.alt cmp</tag> are governed by
the equivalences in examples <xref target=ex21> and <xref target=ex22>,
which are analogues of DeMorgan's laws.
<xmp id=ex21><! [ CDATA [
<fs.alt cmp>                <fs.grp>
   <fs> ... </fs>              <fs cmp> ... </fs>
   <fs> ... </fs>      =       <fs cmp> ... </fs>
        ...                             ...
</fs.alt cmp>               </fs.grp>
]]></xmp>
<xmp id=ex22><! [ CDATA [
<fs.grp cmp>                <fs.alt>
   <fs> ... </fs>              <fs cmp> ... </fs>
   <fs> ... </fs>      =       <fs cmp> ... </fs>
        ...                             ...
</fs.grp cmp>               </fs.alt>
]]></xmp>
<div2 id=lib>
<head>
The use of <tag>fs.lib</tag> and <tag>f.lib</tag>
<p>
We assume, following the recommendations of the Baltimore meeting of the
AI1 group, that <tag>fs</tag>s can appear anywhere within a text as an
inclusion exception.  When they do so, it is probably advisable not to
use <tag>xref</tag> to point from within an <tag>fs</tag> to predefined
substructures, as SGML validation is not possible.
<note place=inline>
That is, SGML cannot enforce the restriction that <tag>xref</tag> must
point to a member of the <tag>fs</tag> or <tag>f</tag> family, since
the <term>target</term> attribute is specified as IDREF, which can
occur on essentially any tag.
</note>
<p>
We suggest wherever possible that members of the <tag>fs</tag> family
be gathered together into a separate file or subdocument, or designated
part of the main document (say, as a daughter of <tag>tei.1</tag>)
identified as <tag>fs.lib</tag>, and that these elements be pointed to
at appropriate places within the text by means of <tag>xref</tag>.
<p>
Similarly, we suggest that individual features be gathered together into
a separate file, subdocument or part of the main document, identified as
<tag>f.lib</tag>, and that these elements be pointed to at appropriate
places within the elements occurring in the <tag>fs.lib</tag>.
Moreover, whenever a feature value is a member of the <tag>fs</tag>
family, it can be replaced by a pointer to the appropriate element in
<tag>fs.lib</tag>.
</body>
</text>
</tei.1>