John Young wrote
> Would this not run into problems if you were sorting book/article titles
> such as 'The The: the History of a Band' (I believe there was a band
> called The The), or '"The Taming of the Shrew": A Feminist Re-appraisal'?
Though apparently about alphabetical sorting, this point actually engages
with another issue in this thread (and the one Conal has forked) namely the
possibilities and limitations of auto-generated indexes. My gripe about one
of Lou's examples was that it seemed to me to overstate the applicability of
something that was nevertheless both feasible and useful within less
For very good reasons, the TEI approach tries to be thoroughly systematic,
and on the whole is remarkably successful, seeing that (unlike say the
textual domain that DocBook markup primarily addresses) it has to find ways
of applying systematic and consistent treatment to things like novelists who
include spoof indexes as part of their actual creations (though that's not a
request for P5 to substitute Pale Fire for the Anatomy of Melancholy in its
example repertoire) and scribes who decided while chanting Prime that they
would liven up their day in the Scriptorium by seeing how many subtly
different ways their stylus could be persuaded to represent the same
character sequence on a single folium.
Sometimes, however, getting things done effectively (maybe at the price of
procedural inelegance) requires suspension of systematic practice. Faced
with the title-sorting quandary, I would be inclined to implement a
processor which was Mostly Smart Enough, but which Knew When to Take a Hint.
More concretely, such a processor, if presented with (in the original
cases ) a <head> of type "title" or a <title> element proper with no
additional internal markup, would apply the relatively simple algorithms
needed in the most common cases to parse out components of the string that
should not influence the sort. But where, as in John's examples, attempts to
make the processor sufficiently clever to do so unaided would require a
large, and probably futile, expenditure of programming effort, I would mark
up those <head>s, and those only, with a "hint" to tell the processor,
"Don't try to parse this one yourself, sort it as if it read like this..."
The tags meant for indexation would seem to me to be usable for this without
any severe abuse. So it isn't just a matter af choosing between two
alternatives, each with its own disadvantages: time-consuming and
human-error-prone sort-component tagging of each and every title on the one
hand; reliance on a possibly under-sophisticated processor on the other.
Discussions between advocates of these two approaches, despite the superior
rigour and generality of either alternative as compared to the pragmatic
eclecticism I am proposing, are bound to be inconclusive. But while the
point was being thus debated, I'd like to think that the approach I suggest
would be getting the job done.