Print

Print


I think the short answer to your question is that 
you're not alone in finding this difficult, and 
that the TEI does not seek to provide a full 
classification of this kind of textual phenomenon 
that would satisfactorily encode the semantic 
nuances of the features you're describing here.

This is a particularly tricky area, I think, 
because the semantics of these features is both 
harder to characterize (since it's often subtle 
and hard to pin down) and more subject to 
contextual factors arising from other encoding 
decisions you might have made.

For instance, I could imagine encoding your first 
example (the letters carved in skin) as:

<mentioned> since the string is being referred to 
for its stringness rather than for its meaning
<quote> since it is in effect a text from another 
source (albeit not a typically bibliographic one 
:-)
<seg> (perhaps with a type= attribute) if this 
were a common phenomenon in your text or part of 
some larger textual signifying pattern in which 
you had a particular interest

Your second example is even harder. If the phrase 
denotes something akin to an official title (like 
"Homecoming Queen"), then one might imagine 
making a case for <roleName>, but that element 
isn't allowed except within <persName>, which is 
harder to justify here. If the phrase is taken as 
representing a commonly circulating term (i.e. 
quoting the current gossip) then one could 
imagine making a case for <quote>. If it's just 
taken as a phrase referring to the person in 
question, <rs> would be justifiable though it 
wouldn't say much, it seems to me. <term> is 
another option in this general domain but carries 
the implication of specialized/technical 
terminology. If <socalled> didn't carry an 
implication of ironic usage, it might be the 
closest of all, and this may reveal a gap in the 
TEI's coverage of semantic markup of highlighted 
text--perhaps there needs to be an element that 
means, in effect, "this is a phrase used to 
describe something else".

I think what this reveals is that the TEI suite 
of semantically specific elements for highlighted 
text focuses on areas where one feels a strong 
external motive to encode: in other words, areas 
such as naming or terminological definition where 
the benefits of marking the semantics are clear 
and generalizable. The encoding activity you're 
describing has a different kind of intention (and 
one I find both interesting and sympathetic): the 
goal of accounting for the nuances that the text 
foregrounds. Encoding of this sort is harder 
because the question it asks isn't "which box 
does this go in?" but "what boxes does the text 
ask me to build?" and also partly "what boxes am 
I interested in by being interested in this 
text?" As a result, it's difficult to arrive at a 
classification that will generalize well.

So I think there are two strategies to consider, 
depending on what you're trying to accomplish:

1. Fit the text into the elements the TEI provides
--either by choosing the element that is closest 
to the meaning you intuit (so in the first 
example, perhaps <mentioned>, and in the second 
example, perhaps <rs>?)
--or by choosing an element that does minimal 
misrepresentation by being as semantically null 
as possible: this is the argument for <hi>, or 
for <seg> if you want to reserve <hi> for 
elements whose highlighting specifically carries 
*no* semantics (such as decorative highlighting 
of the first word in a chapter)
As an aside, the Women Writers Project created an 
element for just this latter purpose, called 
<mcr> (meaningful change in rendition), which we 
use to encode highlighted words/phrases whose 
highlighting is semantically mysterious to us but 
clearly isn't simply decorative.

2. Create elements that represent the semantic 
distinctions you detect in the text, if these are 
not already represented in TEI. You could do this 
comparatively painlessly by using <seg> with a 
type= attribute whose values you determine, or 
you could create new named elements that might be 
syntactic sugar for <rs> or <seg> or <hi> but 
would represent your own sense of what's going on 
in the text. I'm tempted to suggest possibilities 
but my imagination isn't quite fertile enough 
today.

Best wishes, Julia

Julia Flanders
Women Writers Project
Brown University

At 8:14 AM -0700 3/6/07, Jon Noring wrote:
>Everyone,
>
>I've been studying the TEI elements for marking up the semantics of
>highlighted text (avoid using <hi>). Chapter 6 of the TEI P4X spec
>provides the "core" overview.
>
>Yet, I am finding it difficult to come up with a "standardized"
>uniform approach that will work on most if not all highlighted inline
>text I run across in contemporary books. As I go through a catalog of
>recently published books, I'm always finding a few oddities that are
>somewhat difficult to semantically categorize, either because they
>occupy some "gray" area, or are simply difficult to semantically
>characterize to standard TEI elements.
>
>So, has anyone here come up with a workable set of standardized
>guidelines for this purpose? I would think that some projects to
>markup a lot of books in TEI would try to standardize their approach
>to semantically markup highlighted text.
>
>Or is this an exercise in futility? <laugh/>
>
>As an example, here's just a couple snippets where I'd like to
>properly assign TEI semantics to the italicized text:
>
>    It took a few minutes and a couple of us to figure it out, but we
>    determined there are three letters carved in her skin: <i>CXJ</i>.
>
>    It was almost certain she would be voted <i>best-looking senior</i>
>    in the class of 02 at her Cocoa Beach high school next year.
>
>
>Thanks for your insights!
>
>Jon Noring