LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-L Archives


TEI-L Archives

TEI-L Archives


TEI-L@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-L Home

TEI-L Home

TEI-L  November 1991

TEI-L November 1991

Subject:

%% Undelivered Mail %%

From:

[log in to unmask]

Reply-To:

[log in to unmask]

Date:

Mon, 25 Nov 1991 18:49:00 GMT

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (449 lines)

Your mail was not delivered as follows:
%MAIL-E-SENDERR, error sending to user ELS009
-MAIL-E-OPENOUT, error opening DISK$EL:[ELS009]MAIL.MAI; as output
-SYSTEM-F-IVDEVNAM, invalid device name
%MAIL-E-SENDERR, error sending to user ELS009
-MAIL-E-OPENOUT, error opening DISK$EL:[ELS009]MAIL.MAI; as output
-SYSTEM-F-IVDEVNAM, invalid device name
 
Your original mail header and message follow.
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Via: UK.AC.EARN-RELAY; Mon, 25 Nov 91  18:49 GMT
Received: from UKACRL by UK.AC.RL.IB (Mailer R2.07) with BSMTP id 3070; Mon, 25
          Nov 91 17:31:59 GMT
Received: from UICVM by UKACRL.BITNET (Mailer R2.07) with BSMTP id 9230; Mon,
          25 Nov 91 17:31:42 GM
Received: by UICVM (Mailer R2.07) id 7718; Mon, 25 Nov 91 11:01:28 CST
Date:     Mon, 25 Nov 1991 09:40:59 CST
Reply-To: Gary Simons <[log in to unmask]>
Sender:   "TEI-L: Text Encoding Initiative public discussion list" <[log in to unmask]
          UICVM>
From:     Gary Simons <[log in to unmask]>
Subject:  Report of names, dates, measures subcommittee
X-To:     [log in to unmask]
X-cc:     [log in to unmask]
To:       "Thomas N. Corns" <[log in to unmask]>
 
      REPORT OF SUBCOMMITTEE ON NAMES, DATES, AND MEASURES
                17 November 1991, Myrdal, Norway
 
                         by Gary Simons
 
 
 
1. INTRODUCTION
 
     One of the major unresolved issues that was identified
during the initial round of presentations at the meeting was the
mechanism for marking up names, places, dates, and measures.
Daniel Greenstein, representing the historical research
community, reported that the "crystal" approach sketched in P1
was much too limited (and limiting) for the needs of his
colleagues.  He further observed that the feature structure
mechanism developed for linguistic analysis might offer a
solution.
 
     A subcommittee of Daniel Greenstein, Jacqueline Hamesse, and
Gary Simons was appointed to work on this problem.  Specifically,
the subcommittee was asked to do the following:
 
1.   Identify the constituent parts (or, features) which must be
     nameable for names, dates, places, and measures.
 
2.   Develop examples of the encoding of the interpretation of
     names, dates, places, and measures in feature structure
     notation.  Can this formalism handle alternation and
     uncertainty?
 
3.   If there is still time, develop a Feature System Declaration
     for the markup of names, dates, places, and measures.
 
     In a nutshell, we concluded that the feature structure
formalism (question 2) provides a good solution to the markup
needs of historians.  In as much as this solution treats the
names of structures and their parts (or features) as attribute
values, rather than as the names of elements and attributes which
must be declared in the SGML DTD, there is no need for the TEI to
answer question 1; it can be left to the historians themselves.
Greenstein was not only satisfied that this solution would work,
but was insistent that any solution that predefined the
structures and their features would fall short of meeting the
open-ended needs of the historical research community and thus
not be accepted by them.
 
     As for question 3, we did not develop an FSD, but concluded
that there were no problems in principle.  In fact, FSDs for
historical markup would generally be simpler than the sample
FSD's proposed in AI1W3 for linguistic markup, since they would
be less likely to use complex statements of default values or co-
occurrence constraints.
 
     The substance of our report back to the full meeting
consisted of the following more general points which bear on the
work of other subcommittees of the TEI.
 
 
2. GENERALIZING FEATURE STRUCTURES
 
     Our solution for markup of the interpretive markup of names,
places, dates, and measures is based on P1's feature structure
mechanism.  One change that the AI1 working group proposed last
January was to simplify feature structure markup by removing the
<f.struct.name> and <f.name> elements and replacing them by
"name" attributes in the <f.struct> and <feature> tags,
respectively.  (Another change which I think should be made, but
I don't recall whether we proposed it earlier, is that the atomic
value of a feature simply be text rather than being embedded in
an <f.struct> that has no internal structure.)
 
     At the Myrdal meeting, the AI chair (Terry Langendoen),
proposed a further change, namely, renaming <f.struct> to
<record> and <feature> to <field>.  He suggested that such a
change might have the effect of making these elements more
accessible to non-linguists.  While these names probably do have
the widest familiarity, some participants cautioned that these
names might have the disadvantage of raising expectations
concerning TEI as a database system and bring the project into a
whole new realm for consideration of compatibility with database
standards and practices.
 
     Our subcommittee did feel that Langendoen was right about
the idea of renaming <f.struct> to make it more general and more
accessible to a wide range of research communities.  More neutral
than <record> would be simply <structure> (or <struct> for short)
and that is the tag I will use in the remainder of this report.
And what about the internal structure?  A nicely generic term is
<part>, which I will also use throughout.  Another possible pair
of tags is <element> and <attribute>.  (These names follow the
SGML nomenclature and thus might be confusing, but they do serve
the purpose of highlighting the analogy to SGML content elements
and attributes and of providing user-definable ones which allow
embedded elements within attribute values.)  And note, further,
that the simplified definition of feature structures makes the
<unit> and <level> tags of P1 redundant.  This leaves us with the
following candidates for the names of the general-purpose
structure and its constituent parts:
 
     <structure>         <part>
     <record>            <field>
     <element>           <attribute>
     <unit>              <level>
     <f.struct>          <feature>
 
Note, too, that some degree of mixing and matching is possible.
That is, there are some other combinations of names from the
above two lists that might work, like <structure> and <feature>,
or <unit> and <part>.
 
 
3. COLLECTIONS OF STRUCTURES (OR RECORDS OR UNITS OR WHATEVER)
 
     Another of Langendoen's proposals was that we add
<record.collection> to encode collections of records.  We found
that a mechanism like this is definitely needed for the
application of historical research.  As well as marking up text,
historians also want to encode "data dictionaries" which contain
all the names, places, dates, measures, and so on they have found
and which collate all the information known about each.  Some
questions remain to be answered:
 
     (1)  What do we call these?  <record.collection> goes well
          with <record>.  What if we go for <structure> rather
          than record?  <structure.collection>  <structure.set>
          <data.dictionary>  <data.set> ... ?
 
     (2)  Should a record collection be any arbitrary collection
          of records, or should it declare a type and be
          constrained to contain only records of that type?
 
     (3)  Where does a record collection that accompanies a
          marked up text go?  Does it go somewhere in <back>, or
          does it require a new tag or a change to the content
          model for <text>?
 
Note that such record collections could come in handy in fields
other than history.  For instance, a text critical application
could append a record collection to describe the features of all
the witnesses referred to.  A dictionary could append a record
collection to give the feature structures for all of the part-of-
speech tags.  Each witness or part of speech would be encoded as
a record (a.k.a. feature structure) and would contain a unique ID
which would be referred to from the text proper by IDREFs.
 
 
4. A GENERIC SOLUTION FOR INTERPRETIVE MARKUP
 
     The approach proposed in P1 is to use tags like <prop.name>
and <date> in text markup, and to provide these with attributes
to store interpretive information added by the analyst.  Indeed,
the brief given this subcommittee was to devise a list of the
needed tags and the attributes needed by each.  We concluded that
this approach would not, in general, be satisfactory for the
following reasons:
 
     (1)  Historians (as demonstrated in Greenstein's recent book
          on information modeling in historical research) have
          already identified dozens of "primitive data types"
          (including many different types of names and dates).
          To provide an inventory that they feel to be adequate
          would require quite a proliferation of special-purpose
          tags.
 
     (2)  Similarly, there are potentially a dozen or more
          attributes for each tag, further fueling the
          proliferation.
 
     (3)  Investigators will always think of new element types or
          new attributes that are essential for their research
          but are not included in the standard set.
 
     (4)  SGML attributes do not allow embedded structure, but
          that is essential for interpretive markup where the
          investigator must be able to record alternative
          hypotheses or attribute values which are themselves
          structures of a different type.
 
     Our proposal, therefore, is to avoid the above problems by
using the feature structure mechanism in whatever generalized
form it ends up as.  Each different type of primitive data
element would be encoded as a different type of feature
structure.  The eventual successor to the Feature System
Declaration (see AI1 W3) would then specify the allowed features
(or fields) for each structure (or record) type, and the allowed
ranges of values.  In this way the details of the special-purpose
record types and associated fields needed for a particular
encoding task are left to the individual researcher.  These
details could be specified in terms of SGML markup in the FSD
rather than requiring the researcher to also understand how to
extend a DTD.  Communities of domain experts could propose record
types and field specifications in the TEI case books, without
these having to be part of the standard.
 
     Note that the adoption of the general structure/record
mechanism suggests a general principle that could be used to
wield Ockham's razor in the process of developing the TEI tag
set, namely, whenever the tag set seems to be proliferating
beyond what seems appropriate, or whenever there is significant
discomfort about the likelihood of acceptance of application
specific tags by specialists in that domain, the general
structure/record mechanism provides a graceful alternative.  For
instance, at the Myrdal meeting concern was expressed about the
proliferation of tags proposed by the text criticism working
group and disquiet was expressed about the acceptability of the
"situational context" tagging proposed by the spoken text working
group.
 
 
5. EXTENDING THE "FEATURE SYSTEM DECLARATION"
 
     The notion of a Feature System Declaration (proposed in
AI1W3 as an auxiliary file which encodes the semantics behind the
user's use of feature structures) was endorsed in the plenary
session.  However, as we move toward transforming feature
structures into general-purpose record-like data structures it is
necessary to extend the notion of the FSD.
 
     The primary extension is that multiple structure (or record)
types must be declared.  The previously proposed FSD was written
as though all use of feature structures reflected a single
feature system.  With the shift toward record-like structures, we
must introduce the notion of different types of records and each
different type needs a separate declaration.  Thus the notion of
a Feature System Declaration is extended to that of Structure
Type Declarations, or Record Type Declarations.  For each type,
the three kinds of information specified in an FSD are given:
the range of allowed values for each feature (or field), the
default value in the case that a structure (or record) does not
specify one of its features (or fields), and co-occurrence
constraints between the values of multiple features (or fields)
in a single structure (or record).  For instance, if the <record>
and <field> nomenclature is adopted:
 
     ...
     <record-type name=X>
        <field-ranges>
           <range field=Y> ... </>
           ...
        </field-ranges>
        <field-defaults>
           <default field=Y> ... </>
           ...
        </field-defaults>
        <field-constraints>
           <if>  ... </if>
           <iff> ... </iff>
           ...
        </field-constraints>
     </record-type>
     ...
 
If the <structure> and <part> nomenclature is adopted, tags like
the following might be appropriate:
 
     ...
     <structure-type name=X>
        <value-ranges>
           <range part=Y> ... </>
           ...
        </value-ranges>
        <value-defaults>
           <default part=Y> ... </>
           ...
        </value-defaults>
        <value-constraints>
           <if>  ... </if>
           <iff> ... </iff>
           ...
        </value-constraints>
     </structure-type>
     ...
 
     Note that in taking the step of generalizing from linguistic
feature structures to general-purpose record-like structures
further extensions to the record type declaration are called for.
For instance, we would probably want to be able to identify the
key field (or combination of fields) for a record type in order
to facilitate export/import of record collections to/from
database systems.  We would want to develop a richer set of range
constraints than was proposed for linguistic feature structures,
for instance, including integers and reals with minimum and
maximum value constraints.
 
 
6. NAME, DATE, ETC. MARKUP WITH INTERPRETIVE RECORDS IN-LINE
 
     One approach to the interpretive markup of names, dates,
measures, and so on would be to place the record structures
directly in the text.  For instance, consider the text fragment
"On the third of the month we travelled to ..."
 
     ... On <structure type=fixed.date>
                <part name=text>the third of the month</part>
                <part name=day>3</part>
                <part name=month>February</part>
                <part name=year>1835</part>
            </structure> we travelled to ...
 
The <structure> includes the bit of text being interpreted as one
of its parts, and then adds more parts to encode the
investigator's interpretation (based on inference from context)
as to the exact date being referred to.
 
     If we want to allow this style of markup, it may require
that <structure> be afforded the status of a "crystal" in the TEI
DTD.  At one point, <f.struct> was allowed only in the
environment of <ling.analysis>.  I'm not sure whether this is
still the case.
 
 
7. NAME, DATE, ETC. MARKUP WITH POINTERS TO INTERPRETIVE RECORDS
 
     The above approach injects a lot of extraneous material into
the text.  A cleaner approach might be to tag spans of text to be
interpreted and then use a reference to the unique identifier of
an interpretive record in a record collection.  For instance,
 
     ... On <date interp=d123>the third of the month</date> we
     travelled to ...
     -------
     <structure id=d123 type=fixed.date>
        <part name=day>3</part>
        <part name=month>February</date>
        <part name=year>1835</date>
     </structure>
 
This example follows the strategy proposed in P1 of having some
basic tags for name, date, and so on.  As discussed above in
section 4, a solution which relies solely on unique tags
(including interpretive attributes) for different primitive data
types does not appear to be acceptable to the historical research
community.  However, this hybrid solution may prove acceptable.
The <date> tag, for instance, is a generic one.  It is used to
tag all primitive data types pertaining to dates.  The type
attribute of the associated <structure> tells what specific kind
of date reference it is.  The <date> tag uses an "interp"
attribute to point to the interpretation.  (Another possible name
would be "analysis".)
 
     The above example is simple enough that the values of the
parts could have been encoded as attributes of the <date> tag.
However, in general this is not the case.  The values of the
parts could be embedded structures, or they could involve
alternatives.  For instance, if it were not clear whether the
year were 1835 or 1836, the relevant <part> might be encoded as:
 
        <part name=year><or><atom>1835</atom>
                            <atom>1836</atom></or></date>
 
This example illustrates the problem of dealing with atomic
values in lists of values.  In the original <f.struct>
formulation, the atomic values of features were coded as
<f.struct> containing nothing but the string which was the value.
On further reflection this didn't seem right and in the meeting
of the AI1 working group last January we may also have
recommended the creation of the <atom> tag.  That proposal would
have required <atom> everywhere, not just within lists.  Note,
however, that the <unit> and <level> did allow bare text strings
as values (which was possible because it did not allow
alternations).  If we want to preserve the ability of part (or
features) values to be bare strings without the seeming
inconsistency of tagging them as <atom> in some contexts, we
could substitute a tag like <term>, for "term in a list", which
is used only within an <or>, <and>, or <list> to show where a
single value begins and ends.
 
     Much more complicated examples of alternatives in an
analysis can be devised, and we in fact sketched some in our
subcommittee meeting at Myrdal.  They involved enough specialty-
specific details, however, that I feel uncomfortable trying to
reconstruct them.  Rather, Dan Greenstein has proposed to attempt
the markup of some real examples.
 
     One general principle we discovered in working these
examples is that the markup scheme should allow us to distinguish
between analytical hypotheses and interpretive conclusions.  For
instance, in dealing with a personal name in a text, one can
initially associate it with an analytical structure of type
"personal.name" (which could include a number of alternatives)
without yet knowing who the name refers to.  The ultimate
conclusion would be to associate the name with a particular
person, for whom there would also be a structure in the data
dictionary.  The pointer to the person could either be another
attribute of the top-level <name> tag, or another <part> of the
interpretive structure for the name (with an <xref> value).
 
     Finally, we must note the similarity of this markup problem
to that of marking arbitrary spans of texts in literary analysis,
such as to mark metaphors, for instance, and provide some
analysis of them.  Another subcommittee at Myrdal proposed a
<span> tag for just this purpose.  Providing an attribute in
<span> which uses an IDREF to point to a record structure that
has the analysis sounds like a good general solution.  In fact,
tags like <date> and <name> could be replaced by a general <span>
tag and avoid altogether the problem of what exactly this set of
tags should be.  In this case, the type of thing being marked
(e.g. personal name versus place name versus fixed date versus
metaphor, etc.) would be encoded as the "type" for the structure
referred to by the "interp" of the <span>.
 
     The generalized <span> solution commends itself for another
reason, namely, analytical alternatives can percolate up to the
very top level of markup.  For instance, in "I went to Rochester
to get help," Rochester could be the name of a person or it could
be the name of a place.  If the markup scheme specifies separate
tags for <pers.name> and <place.name>, then how can this be
marked up?  If on the other hand, Rochester is simply enclosed in
a <span>, then its "interp" can include two IDREFs (one pointing
to a <structure type=personal.name> and the other to a <structure
type=place.name>), or it could point to a single <or> which
includes both structures.  Another thing we should probably be
prepared for is the possibility that the content of tags like
<name> and <date> might need to overlap, which would not be
possible.  The <span> mechanism, however, can handle that sort of
thing.
..
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
End of returned mail

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
January 1996
December 1995
November 1995
October 1995
September 1995
August 1995
July 1995
June 1995
May 1995
April 1995
March 1995
February 1995
January 1995
December 1994
November 1994
October 1994
September 1994
August 1994
July 1994
June 1994
May 1994
April 1994
March 1994
February 1994
January 1994
December 1993
November 1993
October 1993
September 1993
August 1993
July 1993
June 1993
May 1993
April 1993
March 1993
February 1993
January 1993
December 1992
November 1992
October 1992
September 1992
August 1992
July 1992
June 1992
May 1992
April 1992
March 1992
February 1992
January 1992
December 1991
November 1991
October 1991
September 1991
August 1991
July 1991
June 1991
May 1991
April 1991
March 1991
February 1991
January 1991
December 1990
November 1990
October 1990
September 1990
August 1990
July 1990
June 1990
April 1990
March 1990
February 1990
January 1990

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager