Peter's response is fair and represents what we might think of as the
"high-percentage encoding" perspective: encoding which has a clear
and immediate payoff in generally useful functions such as searching
and display. For these purposes, I think he's exactly right: the few
semantically specific elements like <title>, <foreign>, <mentioned>,
etc. have been designed to catch the high-percentage features, and
the catch-all <hi> element exists for whatever cannot be catgorized.
I'm very interested, though, in what I hope it won't seem
intrinsically pejorative to term "low-percentage encoding": that is,
encoding which seeks to express a comprehension of the text as it
presents itself. This approach assumes precisely that you have a
research interest in the text of a more nuanced sort. It might not
require that you are already interested in some specific phenomenon
(writing on skin, terms for prom queens) but rather that you are
interested in the specifics of the text in general (so to speak).
This approach to textual study is typical of literary scholarship,
and it approaches the text not by searching for the things it already
knows are there and ignoring the rest, but by reading the text,
observing it, and trying to account for what it's doing.
Because the text encoding world has focused so exclusively till now
on high-percentage encoding, the tools (both technological and
mental) are all clustered in that area. As a result I don't think
we're yet in a good position to evaluate the usefulness of
low-percentage encoding as a scholarly practice. However, I'm almost
certain that once people who are interested in it start actually
doing it, we'll see some useful outcomes.
It's also interesting that this approach should raise the possibility
of guilt and obligation. One is surely never *obligated* to encode
anything--it is only a question of whether one's encoding is
well-suited to the intended purpose or not. Someone who had been
hired to encode the maximum quantity of text as cheaply as possible
should perhaps feel guilty for *not* using <hi> and for indulging in
more expensive nuances; someone who had been hired to encode shades
of semantics should perhaps feel guilty for failing to distinguish
between <mentioned> and <socalled>.
Since I don't know very much about Jon's project, it's hard for me to
say at this point whether the semantic nuance he asks about is
pointless, essential, or somewhere in between, but it's certainly an
interesting area to explore.
best wishes, Julia
At 5:47 PM +0100 3/6/07, Peter Boot wrote:
>I'd say this is pointless, unless you have a research interest in
>precisely this phenomenon (meaning of highlighted text). There are a few
>simple cases, like <title> and <foreign>, and these might be useful for
>searching. Your examples could perhaps be argued to be <mentioned> (as I
>see now Julia does for the first one).
>But in any text, there are a million things that are of semantic
>interest of which we do not try to capture the meaning at the encoding
>stage (such as: choice of words, word order, etc.). Why then should we
>have to capture every shade of significance in the case of highlighted
>I'd stick to <hi> for these cases and not feel very guilty about it.