At 12:38 PM 4/11/2006, you wrote:
>My crude mind was after much simpler things in my original
> There is a large number of pre-twentieth century novels
>where the distinction between speech and narrative is pretty obvious.
>There is an indeterminate subset of these novels--but probably not a
>trivial number--where this distinction can be inferred fairly
>accurately from typographical layout (though the scripts to extract
>the distinctions would have to be adjusted from author to author or
>publisher to publisher).
>You can then retroactively tag speech in those novels, and this would
>give you a 'spoken' corpus as opposed to a 'narrative' corpus. Is
>that of likely interest to anybody, and is there any reason to
>believe that the corpus resulting from these various constraints
>would be 'good enough' for some or many inquiries?
Hm, I guess it would depend on the inquiries.
The first thing I'd find to be of interest would be which works would
fall into this set, and which works would not. And how "fuzzy" the
set would be. How would one detect whether a work was in the set?
What makes the distinction between speech and narrative consistent
and obvious? Presumably the presence of some markers (say, for
dialogue) and the absence of others (say, for indirect discourse).
Even then, one would have to be on the watch for false hits. Some
kinds of narrative might have conventional markers for dialog, and
yet pose similar problems for narrative subjectivity as those that
did not. (I'm thinking of Charlotte Bronte's works, or George
Eliot's, as possible boundary cases.)
Ease of auto-tagging (which is to say, level of consistency and
explicitness of "tagging" by layout, typography, and narrative
convention) might be an interesting marker of some sort of genre ...
or it might not. It would be particularly interesting to see where
such works would cluster. But this is meta-analysis, and has nothing
to do with what you could learn, or not, from such tagging once you had it.
As to the latter, I think this would depend on the genre(s) of works,
their commonalities (or not) over and above this property of
consistency, and what kinds of inquiries you might pose relative to
>My hunch is that there isn't much interest in this--which is itself a
>useful thing to know
But there's a difference between the level of interest going in, and
the potential for discoveries that could be interesting. :-)
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML