I fear my ambitions are much lower than what Lou implies when he says
"use the 'who' attribute." The question who speaks is much harder to
determine than whether a given stretch is narrated or spoken. Before
at least the middle of the nineteenth century that distinction seems
fairly straightforward. Once get to indirect free speech or 'erlebte
Rede' we're in a different world.
Once the distinction between speech and narrative is made with
tolerable accuracy, it would be relatively simple, if time consuming,
to add "who" attributes and identify speakers. This could be done by
whoever cares enough about a novel to think this is worth doing for
their particular project.
My question is whether enough people will think there is value in
introducing the primary distinction between narrative and speech into
a large set of novels where this can be done through scripts that
take advantage of typographical "markup."
On Mar 20, 2006, at 6:05 PM, Lou Burnard wrote:
> Michael Beddow wrote:
>> The classic example of this, long before TEI -- or SGML -- was
>> thought of,
>> was John Burrows' encoding, in a database-like format, of Jane
>> novels. [see J.B., Computation into Criticism, Oxford 1987].
> This is indeed the locus classicus for the kind of markup Martin
> has in mind. It doesn't actually predate the TEI that much, and
> doesn't predate SGML at all. More interestingly, perhaps, I can now
> reveal that I personally wasted several weeks of my life on the
> thankless task of converting that "database-like format" into
> kosher TEI SGML shortly afterwards. I say "thankless" advisedly,
> because OUP then decided they didn't see the point of publishing
> electronic texts anyway.
> Not a problem for TEI markup of course: just use the "who"
> attribute on <sp>.
> > he got statistically stronger differentiations by considering
>> the frequency and collocations/colligations of such "stopwords" in
>> speech assigned to specific characters than he did from more
>> "characteristic" stylistic or lexical features.
> I'm surprised to see someone generally so careful with language as
> Michael implying that "collocation" and "colligation" are
> interchangeable technical terms. Most people working in corpus
> linguistics these days tend to distinguish them rather sharply. E.g.
> "Colligation is a type of collocation, but where a lexical item is
> linked to a grammatical one. Surprising, amazing and astonishing
> are nearly synonymous. We can say it is astonishing/suprising/
> amazing, but we tend to say it is not surprising and not the
> others- surprising colligates with the negative." (http://