Print

Print


I have been thinking about retroactively tagging spoken sections in  
various fiction corpora, and I wonder whether anybody has advice on  
the utility or feasibility of such a project.

As for feasibility, it's certainly going to be a tedious business.  
You have to look at files one by one and figure out whether through a  
combination of authorial pointers (she said) and typographical  
devices (quotation marks, dashes, etc) you could get good enough  
results (whatever 'good enough' means in that context. And you'd have  
to keep your fingers crossed that a script that works for one work or  
author will with little labor do other texts well enough. Does  
anybody have experience with that kind of work?

As for utility, it is a reasonable assumption that narrative and  
speech will differ significantly in just about every text. I learned  
this with Homer, where narrative and speech seem on the surface quite  
continuous. There was a study some years ago that claimed to  
distinguish between the authors of the Iliad and Odyssey on the basis  
of the distribution of common words. But what that study measured was  
mainly the fact that characters talk more in the Odyssey.

Are there stylometric or thematic analyses for which scholars would  
like to have tagged fiction corpora where narrative and speech are  
tagged with sufficient accuracy? By sufficient accuracy I mean a  
level that would allow a scholar interested in a particular smaller  
set of works to bring them up to snuff himself over the course of a  
long weekend.