LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-L Archives


TEI-L Archives

TEI-L Archives


TEI-L@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-L Home

TEI-L Home

TEI-L  March 1991

TEI-L March 1991

Subject:

Response to Final Critique

From:

Elli Mylonas <[log in to unmask]>

Reply-To:

Text Encoding Initiative public discussion list <[log in to unmask]>

Date:

Thu, 14 Mar 91 09:57:07 EST

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (351 lines)

A Response to the Literature Working Group's "Final Critique"
 
<H1>Introduction
<H1>Tagging and Interpretation
  <H2>All markup is interpretation.
  <H2>Most texts are already interpreted.
  <H2>All text entry is interpretation.
  <H2>Tagging is scholarly work.
  <H2>Presentational markup is also interpretation
<H1>The Importance of Descriptive Markup
  <H2>DTDs
    <H3>A DTD Helps Create Software and Helps Formalize Interpretations
    <H3>The Apparent Loss of Flexibility Created by a DTD is an Illusion
<H1>Different Kinds of Scholarship & Texts
  <H2>Only Texts that are Based on Book Technology are Discussed
  <H2>Reference Systems are more complex than just pages & lines
<H1>Verbosity of Tagging / Localization
  <H2>Verbosity is not What it Seems
  <H2>Examples of Software and Macros for Working with SGML
 
<H1>Introduction
<P>In their "Final Critique", Paul Fortier and the Literature Working
Group present an evaluation of the TEI guidelines (1.1) as they apply
to the tagging of literary texts. Following are comments on several
of the points made in the "Final Critique." However, before
presenting a detailed response, we discuss some facts about the
solutions chosen by the TEI. We will also explicitly present some of
the theoretical assumptions on which our responses are based.
 
<P>SGML serves to facilitate our work as preparers of texts for other
scholars to use and as scholars ourselves.  SGML provides a general
mechanism that may easily be used to encode any structures that may
be abstracted as a hierarchy. It also allows encoding of more complex
non-heirarchical structures, though with a concommitant increase in
the complexity of the markup. The information represented in this way
can cover a large range of different semantic domains -- typography,
cross-reference structures, metrical and verse patterns, imagery and
discourse structures,  even the physical state of a text.  For most
types of text, certain text features provide a basic level that most
readers or users of a text would require in order for the text to be
useful.  Examples of such features are those which are conventionally
expressed by the appearance of a text.
 
<P>It is also important to remember that SGML encoding adds great
value to electronic texts because electronic texts are not only
interchanged by different researchers with differing research agendas
and hardware, but may also be subjected to different kinds of
processing by the same person on the same computer.  SGML makes an
electronic text more versatile and amenable to processing by computer
by providing unambiguous indications of those features that a scholar
deems of importance.  This processing ranges from output using
different devices, to statistical analysis, to standard text
searches. At the same time, moving a text that has been written and
printed in book form into electronic form will always entail some
compromise.  We must try to retain as much information as possible,
and to be as general as we can in not excluding features that other
readers and users of the text may want. However, even if we
photograph each page of a book and present it as page image, there
may still be information on the page that is not represented in the
resulting computer file.
 
<P>In order to maximize our ability to share these texts and to
encourage the development of software for using and analyzing them,
it is important to minimize the needless and unmotivated diversity of
basic tagging that would otherwise result.  Indeed, as text entry
projects have increased in number, the current range of encoding
schemes is already creating problems in sharing texts. Although many
of the respondents to the survey did not appear to be interested in
interchange, it is the experience of the writers that as soon as they
have a text on-line, they begin to receive requests for that text.
Also, a number of the larger text project have the dissemination of
texts as their primary goal.  This is why the TEI must come up with a
basic tagging vocabulary, and define the basic structures that are
likely to be commonly tagged. Of course the textual features that
scholars may wish to encode are without limit -- this is why the TEI
does not attempt to preempt scholarly creativity, and provides
several means to extend and supplement its recommended encoding
scheme.
 
<P>Finally, texts are not studied only by scholars of literature, nor is
literature the entire scope of the TEI--the same text may prove
informative for linguists, historians, philosophers and other scholars,
disciplines with their own particular interests and methodologies.  It
is prudent, when encoding a text, to provide as much generally useful
information as possible, so the text may be of value to many different
fields and disciplines.   When discussing
encoding schemes, the range of media on which a text might have been
written must also be considered: there are texts whose forms vary from
collections of papyrus fragments to the increasing number of
machine-readable texts that were created in machine readable form and
may never have existed as paper publications.
/* ----------------------------------------------------------------- */
<H1>Tagging and Interpretation
 
<H2>All markup is interpretation.
<P>The Literature Working Group makes this statement, and we
certainly would not disagree with it.  Indeed, we would go even
further, and say that interpretation is required for all sorts of
markup, both presentational and descriptive. Word boundaries,
italics, themes, font shifts, are all subject to opinion.  Some
perhaps are more controversial than others, but even straight
physical descriptions can cover a wide range of levels of detail and
precision.  For any particular purpose, some levels of markup are
more relevant than others.  For example, Antonio Zampolli tells us
that the lexicographer concerned with building a machine parsable
dictionary may be indifferent to font shifts. (Discussion at TEI
workshop, Chicago, 9/90.)
 
<H2>Most texts are already interpreted.
<P>All texts, except perhaps, an author's manuscript, contain
interpretation. Even if the edition is not critical or canonical,
Even if the edition is not critical or canonical, it nevertheless
contains the interpretations of the publisher, editor, and in some
cases, even the compositor. The Literature
Working Group points out that scholars tend to work with a
canonical or prestigious version of a text which is recognized by
those engaged in serious professional work. (paragraph 14 of the
Critique)  Such texts gain their value from the interpretive work
that has been put into them by the scholar who created the edition.
There are cases where the exact preservation of the physical
presentation of a text is important. Such cases require detailed
presentational markup, and are discussed below.
 
<H2>All text entry is interpretation.
<P>Entering a text that is a primary source is itself a task that
entails interpretation. Whether it is the scanner operator or the
scholar, someone has to make the decision of what specific features
on a specific page are part of the tagging scheme.  In many cases,
scanner operators are working from (manually) marked up copies of
texts, which have been marked by a scholar to disambiguate features
that could otherwise be confused.  Other times, the scanner operator
is someone who can perform the disambiguation and make the decision
her- or himself.  Finally, most texts are proofread by a person who
has the knowledge and the authority to make decisions about the
tagging.
 
<H2>Tagging is scholarly work.
<P>Before a text is entered into electronic form, someone has to make
the decisions about what features are to be marked and how they are
to be marked. The decision must also be made as to where the chosen
features actually appear, so they may be entered correctly (see
above).  Only experts who are intimately acquainted with the texts
and related scholarly problems involved can make these decisions,
develop the tag sets and tagging schemes, and instruct the encoders.
The work of devising markup schemes and Document Type Definitions is
not an easy task, not so much because it is technically difficult,
but because it involves complex decisions about the final encoding of
the electronic document. These decisions must be based on a detailed
and comprehensive knowledge of the form and content of the text.
 
<P>It is also not true that the scholar and the person putting in
markup will never be the same person. ("Final Critique", ad TEI
Guidelines 7.3.1.1, ad TEI Guidelines 5.11.1) Most likely, texts are
being entered under the supervision of a scholar in order that she
can ultimately work on them.  When she does, she will need markup of
the chosen features in the text in order to aid her own work; markup
may also be used to preserve certain aspects of her own textual
interpretations for use by future scholars.  She will in that case,
have to decide on the relevant tags, and enter them into the text. It
is worth noting that the presence of information in a text does not
require that it be used by another scholar -- a purely statistical
analysis of word frequencies might completely ignore markup, while
an information retrieval application would use it extensively.
 
<H2>Presentational markup is also interpretation
<P>There are also texts for which it is extremely important that the
conversion into electronic form retain as much information as
possible about their presentation.  Examples of such texts are early
printed works in which the typography is significant, manuscripts
and fragmentary texts where the position of the letters on the page
is important, and visually creative genres like concrete poetry and
literary collages.  In order to tag such a text, one may need to be
able to describe any nuance of variation in point size, or typeface,
or line spacing.  It is not enough to say that a portion of a text is
indented, or bolded, or in italics.  A vocabulary of such tags that
is sufficiently broad to cover all potentialities seems impossible to
create, and even if it were, would not be suitable for interchange.
 
<P>Therefore, marking the presentation of a text, entails creating
abstractions, and then interpreting the page image in terms of these
abstractions.  The person in charge of entering the text, or of
marking up the text to be entered, must decide which presentational
features are to be preserved in the markup, and to what extent
nuances of spacing, printing or layout are to be regarded as
distinctive.
/* --------------------------------------------------------------- */
<H1>The Importance of Descriptive Markup
<P>The importance of descriptive markup is that it makes the structure
of a text explicit and thus allows processing to take place that makes
use of that structure.  It also tends to include more information in
a text than simple presentational markup can.  When text features
like headings, quotations, and direct speech are tagged, it is
possible to use the text as if it were a database, display and output
it on very different media and do various types of context sensitive
analysis.  Furthermore, it is always easier to remove detail and make
the tagging of a text simpler, than to try and insert detail once it
has been removed.
 
<H2>DTDs
<P>The Literature Working Group also protests the use of DTDs to
encode the structure of literary texts, because they feel that this
forces a particular interpretation on a text. ("Final Critique" ad
TEI Guidelines 2.1.4, ad TEI Guidelines 6.1 para 2)
 
<H3>A DTD Helps Create Software and Helps Formalize Interpretations
<P>The existence of a DTD is helpful for analysis software since it
provides a concise description of which textual features can occur in
particular contexts. The purpose underlying the original
conceptualization of the DTD was to ensure that newly written machine
readable documents corresponded to the "correct" form decreed for
that type of document, thus ensuring that it would be suitable for
automatic processing by a computer. It turns out that a DTD is also
surprisingly useful as a way to rigorously describe the tagging
decisions made in encoding a text.  The verification facilities of
SGML can be used to determine if the actual document, as tagged,
matches the descriptive theory of the document formed by the scholar
as recorded in the DTD. This can reveal shortcomings in the DTD as
well as errors in the encoding of the text. In either case, it
provokes a deeper consideration of the interpretation which is
(inevitably) being done.
 
<H3>The Apparent Loss of Flexibility Created by a DTD is an Illusion
<P>DTDs are not meant to be rigid molds into which we cram texts.  On
the contrary, they grow and change as our understanding of a text
changes.  Creating a DTD and then using it to validate a document
provides interesting and useful information about the structures of
the document and the text type.  SGML, because of the validation
process, can counter many of our assumptions about the structure of a
text, and thus enriches the scholarly process of analyzing and
understanding it.
 
<P>The DTD for textual features of interest to scholars can be
modified by the mechanisms documented in Chapter 8 of the TEI
guidelines. This description is distinctly non-tutorial at the moment
and will probably always remain a task for a person who is not afraid
to plunge into SGML.  Nevertheless, DTD extension is an important
area that is an essential and integral part of the TEI approach to
textual tagging.
 
<H1>Different Kinds of Scholarship & Texts
<H2>Only Texts that are Based on Book Technology are Discussed
<P>When the Work Group discusses markup of texts, they appear to only
be taking a certain type of text into account.  Texts written in
Europe in the last 4 or 5 centuries, that have been printed as books.
In that light, their comments about reference systems and pagination
have some validity, as do the elements that they single out for
tagging.  However, there are many literary texts do not fit these
criteria.  Examples of these are ancient texts, where we have many
manuscript copies of lost papyrus originals where the lines and pages
may no longer represent the original lineation (except in poetry),
and texts created on the computer, which do not have pages, lines and
other features derived primarily from books  In these cases, the
elements that should be tagged differ from those in a book.  The
"Final Critique" does not address these issues.  Finally, the
technology of the printed book is only one phase in the development
of text.  It was preceded by the papyrus scroll, and is being
followed by electronic texts and hypertexts.  Basing markup solely on
typography and book pagination is to build into an electronic text
artifacts of one particular display technology.
 
<H2>Reference Systems are more complex than just pages & lines
<P>An examination of reference systems will make these comments
clearer.  It is not possible to unambiguously locate a place in every
text by using pages and lines.  In the case of ancient texts,
physical pages and lines are not significant (except in the case of
poetry, and even there, line breaks in lyric are disputed).
Reference is usually based on the line or page breaks of a particular
edition, the rest of whose particulars are no longer known, as in the
case of Stephanus pages in Plato.  This information, which is no
longer tied to the physical aspect of a text, provides clear location
information for all editions of that text.  In the case of texts
created on the computer, a reference system may have to rely on the
tagging, since those are the features whose function corresponds most
closely to the page.  The Guidelines actually give a fairly detailed
description of tagging alternate reference schemes (TEI Guidelines
5.6, 5.7)
 
<H1>Verbosity of Tagging / Localization
<P>Throughout the Critique, the Literature Working Group appears to
be overly concerned with problems of data entry and display.  As they
point out themselves, data entry shortcuts and minimization for internal
purposes are possible.  A lot of these arguments are based on the
assumption the primary function of electronic texts is to be read by
humans in their electronic form. Coding schemes like SGML make an
effort to be human-readable, as an aid in data preparation, and a
surety against electronic obsolescence.  However, the primary
function of these texts is to be processed and displayed by the
computer. If a text is tagged with extreme shortcuts, and control
characters for brevity, it may ultimately be much harder to process
and interchange than one which contains verbose but generic tagging.
It should also be noted one of the uses that the TEI modification
features provide is the renaming of any tag.
 
<H2>Verbosity is not What it Seems
<P>Tags that seem very verbose, like <highlighted rendition=ital>
actually contribute toward an economy of tagging ("Final Critique" ad
TEI Guidelines 5.3.2).  If, instead of the rendition attribute, the
TEI recommended a separate tag for each type of rendition, then it
would be necessary to have thousands of tags for every presentational
nuance possible in a text.  Instead, the attribute that provides the
rendition may have any value the tagger of the text chooses to give
it.  It is also possible to restrict the values of an attribute, by
specifying a list of allowed values.
 
<H2>Examples of Software and Macros for Working with SGML
<P>In several places in the Critique, the Literature Working Group
requests the inclusion of examples of macros, to show how local encoding
can be changed to SGML encoding.  This is not an appropriate task for
the TEI to undertake in the Guidelines.  The Guidelines are for the most
part a language specification, like the ANSI specification
for the C programming language, or the formal
description of the WordPerfect document format.  Specifying a
complicated language is difficult and important and it is a distinctly
different task than devising ways to _use_ the language efficiently or
easily.  In the ANSI specification for C, for
instance, there are no examples of how to implement syntax aware C
editors.  Since macros and other pieces of code are extremely
dependent on the software and hardware platform being used (just among
the projects represented by the writers, there are 3 operating systems
and 5 or 6 pieces of software being used for preparing texts as SGML
documents) the people who are best suited to creating such macros are
the computer experts who are working on individual projects.  Not only
that, but, as we all know, software and hardware are mercurial and
evanescent.  Including examples of that sort would mean that the
Guidelines would be providing incorrect and outdated information
almost from the outset.
 
<P>Notwithstanding the above, we think that a tutorial introduction to
tagging literary texts that covers specific techniques and methods would
be a valuable companion to the TEI Guidelines, as would a survey of
available software tools. The recent bibliography of SGML compiled David
Barnard, Robin Cover and Nicholas Duncan is a wonderful resource (Queens
University TR 90-281).  Also the Markup Manual for the Milton Textbase,
written by Lou Burnard.
 
Elaine Brennan
  Assistant Director, Women Writers Project, Brown University
Steve DeRose
  Senior Software Engineer, EBT, Providence, RI
David Durand
  Computer Science, Boston University
Elli Mylonas
  Managing Editor, Perseus Project.
  Research Associate, Classics, Harvard University
Allen Renear
  Senior Planning Analyst for Humanities Computing, Brown  University
 
In composing this response, we benefited from many valuable
discussions with members of the Brown University Computing in
the Humanities Users' Group.

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
January 1996
December 1995
November 1995
October 1995
September 1995
August 1995
July 1995
June 1995
May 1995
April 1995
March 1995
February 1995
January 1995
December 1994
November 1994
October 1994
September 1994
August 1994
July 1994
June 1994
May 1994
April 1994
March 1994
February 1994
January 1994
December 1993
November 1993
October 1993
September 1993
August 1993
July 1993
June 1993
May 1993
April 1993
March 1993
February 1993
January 1993
December 1992
November 1992
October 1992
September 1992
August 1992
July 1992
June 1992
May 1992
April 1992
March 1992
February 1992
January 1992
December 1991
November 1991
October 1991
September 1991
August 1991
July 1991
June 1991
May 1991
April 1991
March 1991
February 1991
January 1991
December 1990
November 1990
October 1990
September 1990
August 1990
July 1990
June 1990
April 1990
March 1990
February 1990
January 1990

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager