Peter's response is fair and represents what we might think of as the 
"high-percentage encoding" perspective: encoding which has a clear 
and immediate payoff in generally useful functions such as searching 
and display. For these purposes, I think he's exactly right: the few 
semantically specific elements like <title>, <foreign>, <mentioned>, 
etc. have been designed to catch the high-percentage features, and 
the catch-all <hi> element exists for whatever cannot be catgorized.

I'm very interested, though, in what I hope it won't seem 
intrinsically pejorative to term "low-percentage encoding": that is, 
encoding which seeks to express a comprehension of the text as it 
presents itself. This approach assumes precisely that you have a 
research interest in the text of a more nuanced sort. It might not 
require that you are already interested in some specific phenomenon 
(writing on skin, terms for prom queens) but rather that you are 
interested in the specifics of the text in general (so to speak). 
This approach to textual study is typical of literary scholarship, 
and it approaches the text not by searching for the things it already 
knows are there and ignoring the rest, but by reading the text, 
observing it, and trying to account for what it's doing.

Because the text encoding world has focused so exclusively till now 
on high-percentage encoding, the tools (both technological and 
mental) are all clustered in that area. As a result I don't think 
we're yet in a good position to evaluate the usefulness of 
low-percentage encoding as a scholarly practice. However, I'm almost 
certain that once people who are interested in it start actually 
doing it, we'll see some useful outcomes.

It's also interesting that this approach should raise the possibility 
of guilt and obligation. One is surely never *obligated* to encode 
anything--it is only a question of whether one's encoding is 
well-suited to the intended purpose or not. Someone who had been 
hired to encode the maximum quantity of text as cheaply as possible 
should perhaps feel guilty for *not* using <hi> and for indulging in 
more expensive nuances; someone who had been hired to encode shades 
of semantics should perhaps feel guilty for failing to distinguish 
between <mentioned> and <socalled>.

Since I don't know very much about Jon's project, it's hard for me to 
say at this point whether the semantic nuance he asks about is 
pointless, essential, or somewhere in between, but it's certainly an 
interesting area to explore.

best wishes, Julia

At 5:47 PM +0100 3/6/07, Peter Boot wrote:
>I'd say this is pointless, unless you have a research interest in
>precisely this phenomenon (meaning of highlighted text). There are a few
>simple cases, like <title> and <foreign>, and these might be useful for
>searching. Your examples could perhaps be argued to be <mentioned> (as I
>see now Julia does for the first one).
>But in any text, there are a million things that are of semantic
>interest of which we do not try to capture the meaning at the encoding
>stage (such as: choice of words, word order, etc.). Why then should we
>have to capture every shade of significance in the case of highlighted
>I'd stick to <hi> for these cases and not feel very guilty about it.