Julia and all,

Thanks for the interesting post.  I have been thinking a lot lately 
about what you call "low-percentage encoding."  As you point out, markup 
has for practical reasons been geared for search and display, and this 
of course enables all kinds of research.  But perhaps one weakness of 
markup so far is that it is more procedurally predefined than 
exploratory, and there are certain aspects of literary scholarship that 
don't jibe with this. 

As some literary projects have been around long enough to have begun 
looking beyond "high-level markup," I wonder what seems to be coming 
next?  Are you looking for ways to accommodate more controversial, 
thesis-driven claims about your texts?  Perhaps even a variety of such 
claims? Or are you looking for ways to layer in even more detail of the 
same kinds of observations present in your first round of markup?  What 
do you eventually want your markup to do for scholarship that it 
currently can't?  I'd be interested in hearing from people on different 


Julia Flanders wrote:
> Peter's response is fair and represents what we might think of as the 
> "high-percentage encoding" perspective: encoding which has a clear and 
> immediate payoff in generally useful functions such as searching and 
> display. For these purposes, I think he's exactly right: the few 
> semantically specific elements like <title>, <foreign>, <mentioned>, 
> etc. have been designed to catch the high-percentage features, and the 
> catch-all <hi> element exists for whatever cannot be catgorized.
> I'm very interested, though, in what I hope it won't seem 
> intrinsically pejorative to term "low-percentage encoding": that is, 
> encoding which seeks to express a comprehension of the text as it 
> presents itself. This approach assumes precisely that you have a 
> research interest in the text of a more nuanced sort. It might not 
> require that you are already interested in some specific phenomenon 
> (writing on skin, terms for prom queens) but rather that you are 
> interested in the specifics of the text in general (so to speak). This 
> approach to textual study is typical of literary scholarship, and it 
> approaches the text not by searching for the things it already knows 
> are there and ignoring the rest, but by reading the text, observing 
> it, and trying to account for what it's doing.
> Because the text encoding world has focused so exclusively till now on 
> high-percentage encoding, the tools (both technological and mental) 
> are all clustered in that area. As a result I don't think we're yet in 
> a good position to evaluate the usefulness of low-percentage encoding 
> as a scholarly practice. However, I'm almost certain that once people 
> who are interested in it start actually doing it, we'll see some 
> useful outcomes.
> It's also interesting that this approach should raise the possibility 
> of guilt and obligation. One is surely never *obligated* to encode 
> anything--it is only a question of whether one's encoding is 
> well-suited to the intended purpose or not. Someone who had been hired 
> to encode the maximum quantity of text as cheaply as possible should 
> perhaps feel guilty for *not* using <hi> and for indulging in more 
> expensive nuances; someone who had been hired to encode shades of 
> semantics should perhaps feel guilty for failing to distinguish 
> between <mentioned> and <socalled>.
> Since I don't know very much about Jon's project, it's hard for me to 
> say at this point whether the semantic nuance he asks about is 
> pointless, essential, or somewhere in between, but it's certainly an 
> interesting area to explore.
> best wishes, Julia
> At 5:47 PM +0100 3/6/07, Peter Boot wrote:
>> I'd say this is pointless, unless you have a research interest in
>> precisely this phenomenon (meaning of highlighted text). There are a few
>> simple cases, like <title> and <foreign>, and these might be useful for
>> searching. Your examples could perhaps be argued to be <mentioned> (as I
>> see now Julia does for the first one).
>> But in any text, there are a million things that are of semantic
>> interest of which we do not try to capture the meaning at the encoding
>> stage (such as: choice of words, word order, etc.). Why then should we
>> have to capture every shade of significance in the case of highlighted
>> text?
>> I'd stick to <hi> for these cases and not feel very guilty about it.
>> Best,
>> Peter


Amanda Gailey
Associate Director
Humanities Digital Workshop
Campus Box 1160
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130-4899