TEI Norway Meeting: the issues
As recently announced, a number of outstanding unresolved issues
were presented for discussion at the TEI's recent Norwegian
meeting. This posting outlines the seven task groups set up at
that meeting, the technical questions they were asked to resolve
and the preliminary answers they presented. More detailed
summaries will follow from the task groups in due course.
1. CHARACTER SET ISSUES
(Harry Gaylord, Syun Tutiya, Steven Derose)
Q. In general, how should ambiguous glyphs (for example full
stop) and synonymous glyphs (for example tilde and logical
`not' in symbolic logic) be encoded, particularly with
respect to phonetic transcription?
A. In phonetic transcriptions, do not use the alternative
encodings proposed in the new IPA; this avoids the problem
of synonymous glyphs. Entity names should reflect the sound
being encoded, not the shape of the IPA or non-IPA glyph
Q. How are suprasegmental features (e.g. intonation) to be
A. Use the IPA suprasegmentals; other notations are not yet
ripe for standardization. In case of need, declare your own
set of entity names.
Q. Three main writing systems are used in Japan: can a TEI
Writing System Declaration handle all of them?
A. A method has been defined which will extend the current
draft to make this possible.
Q. Should the WSD be modified to accommodate `stateful'
charsets (e.g. those with locking shifts like ISO 2022, or
transliteration schemes such as TLG-Beta)?
A. No need is currently seen for this: the existing draft can
already handle TLG-beta and use of ISO 10646 will obviate
any need for shift-in, shift-out control codes. Work will
continue in this area.
Q. What naming conventions should be followed for locally
defined entities to avoid name-clashes? What sources of
entity names are there?
A. Priority should be given to public lists, (existing lists
will be referred to in P2). A new component of the WSD
should be `ucscode' i.e. character position in ISO 10646.
Naming conventions for privately defined entity lists are to
Q. Should the TEI recommend or require a specific order for
diacritics, and if so, what?
A. Follow the rules in ISO 10646 -- begining to end, top to
2. NAMES, DATES AND MEASURES
(Dan Greenstein, Jacqueline Hamesse, Gary Simons)
Q. The group was charged to explore the feasibility of using
the feature structure mechanism as defined in TEI P1 chapter
6 to mark up the internal structure of names, dates and
measurements, with a view to providing a detailed example
for inclusion in the P2 Casebook. The example should
demonstrate how alternate or ambiguous data can be handled,
and if possible include a Feature System Declaration.
A. A written draft will be circulated later. Particular points
noted at the meeting were:
- the notation made the unit/level notation already
provided in P1 redundant
- the names `record', `field', `datadict' were proposed
as alternatives for `f.struct' , `feature', `FSD'
respectively but not generally agreed, as their use
implied a greater degree of convergence between the TEI
concepts and those of existing database systems than
warranted by experience to date.
- for historians, the ability to specify particular
analytical features independently of textual features,
with no need to extend the DTD, was regarded as a
breakthrough of major significance.
- it was noted that the same method might be extended to
handle metaphor, text criticism and any other
- it was agreed that the existing tags `f.s.and',
`f.s.or', etc. should be renamed simply `and', `or',
etc. The need for sequencing of features within a
structure was noted, but no decision taken concerning
applicability of the existing tags `f.set' and
- further technical work identified as necessary for
compatibility with database notions included handling
of record keys, access permissions, and
interoperability with X.500.
3. BASIC TEXT STRUCTURES
(Elli Mylonas, David Robey, David Barnard)
Q. Should the current distinction between generic list and
glossary list be retained?
A. No. The list element should be redefined to contain an
optional head element, followed by a series of item
elements, each of which might be preceded by an optional
label element, thus
<!element list - - (head?, (label?, item)+) >
type (ordered | simple
| gloss | bulleted...) simple >
Q. What support should be provided for loosely defined lists
and floating enumerators?
A. An empty tag might be used to specify the start of a loosely
structured list, with an attribute to point at its end.
Enumerators should also be allowed to float in free text.
The `enum' element itself might also have an attribute to
point to the end of the thing enumerated.
Q. Should there be some additional categorization or structure
within notes, and if so, what are its components?
A. A general-purpose `span' element was proposed, with one
attribute (`end') pointing to an anchor point, another
(`resp') to indicate who is responsible for it, and another
(`type') used to categorize it. Sample values for the
latter were imagery, character, voice, theme, allusion,
style, register, topic, discourse structure, rhetorical
Q. What mechanism should be used to represent embedded texts
e.g. a poem quoted within prose or the dramatic scene in
A. An element `embedded.text' was proposed as an inclusion on
all `div' type elements. Its content model should be the
same as that for text. In addition to the `nested vanilla
div' already agreed at the meeting, it should be possible to
define specifically named textual divisions for prose, verse
or drama. The following example DTD fragment was proposed:
<entity % body "(div | ddiv | vdiv)+" >
<!element text - - (front?, body, back?) >
<!element body - - (%body;) >
<!element div - O ((p | list ...)*, (div*)) +(embedded.text)>
<!element ddiv - O (ddiv+ | (sp | stage |...)+) +(embedded.text)>
<!element vdiv - O (vdiv+ | (v | l | ...)+) +(embedded.text)>
<!element embedded.text - - (front?, %body;, back?)>
This comparatively restrictive content model precludes
shifts from verse to prose within a text (even at div
boundaries) except by means of embedded texts.
[Note: An earlier draft of the above model which does allow
shifts from verse to prose to drama at any div boundary
follows for illustrative purposes:
<!entity % divs 'pdiv | ddiv | vdiv' >
<!element pdiv - O (head?, %p.seq;, (%divs;)* ) >
<!element vdiv - O (head?, (stanza* | l*), (%divs;)* ) >
<!element ddiv - O (head?, (sp | stage | ...)*, (%divs;)*) >
4 INHERITANCE, GROUPING AND REGULARIZATION
(Nicoletta Calzolari, Bob Ingria, Peter Robinson, David
Q. Which of the tags currently proposed for electronic
dictionaries are also appropriate for electronic lexica?
A. The workgroup (AI6) will propose a unified set of names.
Q. Specify unambiguous rules for the semantics of inheritance
within the `grp' tag: when is it over-riding and when
A. Inheritance is defined within the `grp' element only. [This
is a change from the circulated proposal in which
inheritance applies within both `grp' and `form' elements
(eds).] When a `grp' element is nested within an enclosing
`grp' element, any values not supplied for a given leaf tag
at the lower level are understood to be inherited from any
corresponding tag at some higher level. If a given leaf
element is specified at both levels, the value supplied at
the lower level overrides that at the higher. Values are
overridden only for identical leaf tags; any intervening
non-terminal elements are ignored. (Thus a `blort' tag
within a `gram' element at the lower level overrides a
`blort' tag within a `form' or `pron' element at the higher
[Note: this amounts to an attempt to distinguish cleanly
between the use of contained elements to record
characteristics of their parent and the use of contained
elements to indicate subparts of a parent which can inherit
its characteristics: the `grp' tag here has only the latter
function, and all other non-terminals have only the former.
Q. Can the inheritance mechanism be generalized for use in text
A. Yes in principle, but this is not yet recommended.
Q. Can the method proposed in TR2 for handling regularization
be applied to headword abbreviation etc in dictionaries?
A. Yes. For example
Q. Are all the tags proposed for TR2 (regularization,
rdg.group, app. etc) really needed?
A. rdg.group renamed to group; witness.list might be handled as
a feature structure, but no specific recommendations are yet
5. SITUATIONAL PARAMETERS
(Doug Biber, Tom Corns, Stig Johansson, Geoff Sampson)
Q. Propose a single list of parameters that can be placed in
the characteristics.desc section of the TEI header to
document both spoken and written corpus texts
A. The following list was proposed:
- constitution (single, composite, or fragmentary)
- language (principal, others ..., note)
- mode (spoken, written, written-to-be-spoken, spoken-to-
- channel (type = book, periodical, newspaper,
handwritten, typescript ...)
- interaction (type = none, partial, complete)
- addressor (number = 1, plural, corporate)
- addressee (number = 1, plural, bounded, unbounded)
- actual circulation (number = nnn)
- list of participants and participant.group (structured
as in AI2 W1 with minor changes). Participant
characteristics are included with demographic and
situational information. (N.B. situational information
may be different for different participants.)
- setting (as in AI2) location time duration, ...
- preparedness (as in TR6) edited, revised, from notes,
- relation to other texts (was: originality) orig, rev,
adaptation, plagiarism, ...
- factuality (as in TR6) fiction, non, mixed,
- primary purpose (weight each purpose as strong,
indeterminate, weak, or inapplicable): persuasion=
self-expression= informativeness= entertain/edify=
- primary domain (as in TR6) (domestic ... from tr6)
- topic (open)
- The `perceived value' proposal from TR6 was dropped.
- The tag `texttype.decl' should be used to define genre
(eg as synchronic text type, history of text
perception); its contents include any of these
situational parameters as well as prose.
- Most of the above tags are expressed by empty tags,
with value-bearing attributes; any additional
qualification can be supplied by an immediately
following note tag.
Q. At which point or points in the text should the situational
parameters be specified?
A. Values applying to a corpus are in the corpus header within
its `characteristics.desc'; values defining an individual
text type are in `texttype.decl'; and values applying to
individual text are in `characteristics.desc' of the
individual text header.
(Robin Cover, Allen Renear, Paul Fortier, Claus Huitfeldt)
Q. Propose a general method for dealing with uncertainty in
principled way. Is it possible exhaustively to specify
kinds of uncertainty? How should uncertainty about what is
seen be distinguished from uncertainty as to its
A. A tentative strategy was proposed. Questions of legibility
or audibility are distinguished from other kinds of
uncertainty; inaudible and illegible passages should bear
the same tag. (Paradoxically, illegibility is almost the
only kind of textual feature about which it is not possible
to be uncertain.) [Note: the word `indistinct' was later
proposed as a suitable name for this tag (eds).]
For other kinds of uncertainty, there seemed to be a need to
specify three things: what is uncertain, what kind of
uncertainty is involved, and degree of uncertainty.
Possible values for `what' might include `GI', indicating
uncertainty as to whether or not the tag in question applies
to this passage, `startloc' or `endloc', indicating
uncertainty as to whether its start or end has been
correctly located, or the name of any attribute, indicating
uncertainty as to the correct value for that attribute.
Values of `degree' might be yes or no, a range from 1 to 9,
or traditional characterizations such as `doubtful',
This strategy might be implemented in a number of different
ways: as a single three-valued global attribute or a set of
three global attributes; as a floating empty tag bearing
three attributes to specify the uncertainty and a fourth to
specify (by IDREF) the element instance in question; as a
pointer to a feature structure or as a group of empty tags.
Additional requirements noted by the meeting were ways of
representing the reason or cause of the uncertainty, and
ways of distinguishing accuracy or precision for numerical
quantities from uncertainty as to their values.
7. CHOICE OF ALIGNMENT MECHANISMS
(Terry Langendoen, Steve DeRose, Winfried Lenders)
Q. Could the current inventory of alignment mechanisms (al.map,
xref, timeline, extensions, treepointer, fsptr ...) be
reduced? What guidance should be given as to the choice of
A. The xref tag should be used for all pointer mechanisms;
other tags with pointer functions (e.g. al.ptr, f.ptr, etc.)
should be merged with it. Additional attributes would be
needed for some applications. The timeline and alignment
map mechanisms were mirror images which it should be
possible to unify.
Q. For word by word tagging should the currently proposed
entity references be expanded as xref pointers to feature
structures or as full feature structures?
A. The pointer expansion was preferable.
Q. Propose a set of tags (to go in refs.decl in header) which
specify how canonical references are to be specified and
located in text.
A. A detailed syntax, based on tree navigation, was defined.
Apologies are due to any of the participants who feel that our
hastily produced summary misrepresents them. TEI-L is the
appropriate locus for corrections. We look forward to reading
more detailed proposals on each of the above areas... likely to
appear on TEI-L over the coming days as each of the participants
was urged to write up their thoughts, particularly in the new
areas discussed above, and circulate them by that means as soon