Michael Beddow wrote:
> The classic example of this, long before TEI -- or SGML -- was thought of,
> was John Burrows' encoding, in a database-like format, of Jane Austen's
> novels. [see J.B., Computation into Criticism, Oxford 1987].
This is indeed the locus classicus for the kind of markup Martin has in
mind. It doesn't actually predate the TEI that much, and doesn't predate
SGML at all. More interestingly, perhaps, I can now reveal that I
personally wasted several weeks of my life on the thankless task of
converting that "database-like format" into kosher TEI SGML shortly
afterwards. I say "thankless" advisedly, because OUP then decided they
didn't see the point of publishing electronic texts anyway.
Not a problem for TEI markup of course: just use the "who" attribute on
> he got statistically stronger differentiations by considering
> the frequency and collocations/colligations of such "stopwords" in the
> speech assigned to specific characters than he did from more apparently
> "characteristic" stylistic or lexical features.
I'm surprised to see someone generally so careful with language as
Michael implying that "collocation" and "colligation" are
interchangeable technical terms. Most people working in corpus
linguistics these days tend to distinguish them rather sharply. E.g.
"Colligation is a type of collocation, but where a lexical item is
linked to a grammatical one. Surprising, amazing and astonishing are
nearly synonymous. We can say it is astonishing/suprising/amazing, but
we tend to say it is not surprising and not the others- surprising
colligates with the negative."