Print

Print


Phil Burns from Northwestern's IT group and I are working on a project to
provide linguistic annotation for some 40,000 texts published between 1473
and 1700 and transcribed by the EEBO-TCP project. Currently, all these texts
are available only to the members of institutions that have subscribed to
them. But in 2015, some 25,000 texts will pass into the public domain, and
over the following five years another 45,000 texts will follow them. Thus
students of Early Modern English can look forward to a environment that will
soon provide them with access anywhere anytime to a rich set of carefully
encoded data from the first 250 years of English print culture.

A much smaller set of ~2,000 18th-century texts from the ECCO-TCP project
has already been released into the public domain, and we expect to provide
linguistically annotated versions of these texts at some point in the spring
or early summer.

If potential users of these data sets have advice to offer, we would very
much like to hear it, and I would like to seek your advice on a particular
question. First a few remarks about the encoding of these texts.  They were
encoded in a modified of TEI P3 that will be transformed to TEI P5 in the
course of our work. The encoding is light but consistent and allows you to
exclude or focus on words that occur in paragraphs, lines of verse,
epigraphs, notes, list and tables, speaker labels, epigraphs, opening and
closing phrases of correspondence,and a few others. The linguistic
annotation will be "element-aware" in the sense that different rules,
probability tables, and supporting lexica will be used for stuff that is
likely to be special, such as lines of verse, stage directions, or notes.

My particular question has to do with the encoding of notes, stuff put
inside <note> elements. Early modern prose is full of notes.  In the print
originals they occur sometimes at the foot of page, but the great majority
of them are marginal notes (and they often are summaries rather than notes
in a modern sense of the word). In the TCP transcriptions, foot notes and
marginal notes are encoded inline. Footnotes are placed where their markers
occur. Marginal notes are put where they fit best, following broad rules but
leaving discretion to the transcribers. Here is a typical example from A
Defence of the Catholyke Cause (1602):

<P>IT is now more then three yeres, gentle reader, since that one Edward
Squyre,<NOTE PLACE="marg">Edvvard Squyre executed for a fayned conspiracy,
and the author of this treatyse charge therevvith.</NOTE> hauing bin
sometyme prisoner in Spayne, and escaping thence into England, was condemned
and executed for a fayned conspiracy against her Maiestyes person, wherto my
self &amp; some others were charged to be priuy; &amp; for as much as it
seemed to mee that this fraudulent manner of our aduersaries proceeding
against Catholykes, by way of slanders and diffamations, authorised with
shew of publik Iustice,<NOTE PLACE="marg">The reasons that moued the author
to vvryte an Apology in his ovvne defence.</NOTE> and continued now many
yeres, did beginne to redound not only to the vndeserued disgrace, &amp;
discredit of particular men wrongfully accused, but also to the dishonour of
our whole cause, I thought it co~uenie~t to write an Apology in my defe~ce,
&amp; to dedicate the same to the Lords of her Maiesties priuy counsel, as
wel to cleare my self to their honours of the cryme falsly imputed vnto mee,
as also to discouer vnto them the treacherous dealing of such as abuse her
Maiesties autority and theirs in this behalf, to the spilling of much
innocent blood, with no smalle blemish to her Maiesties gouernment, and the
assured exposition of the whole state, to the wrath of God, if it be not
remedied in tyme.</P>

MorphAdorner, Phil Burns' software, treats such <note> elements as "jump
tags", treats their content separately, and "knows" about the reading order
of the main text.  We have two choices for for dealing with <note> elements.
We could leave them where they are, or we could gather them in  separate
<div> elements, leaving sone form of marker at the original location of
their encoding. That procedure would be reversible, and it could also be
separately implemented by anybody manipulating the texts. So in some ways
the question does not matter very much.

But from the OWL perspective (Piotr Banski's lovely term for "ordinary
working linguist"), which choice would provide the better default setting
and be more in keeping with practices elsewhere and the expectations of
scholars who may work with those text?  Notice that this question has
nothing to do with the way in which notes would be displayed in a
browser-based rendering of the texts. It is a question about which choice
would on balance provide an easier or more profitable working environment.

My own view so far has been that there would be some advantages in grouping
notes separately.  It would make it a little easier to attend to notes as a
genre in their own right, it would make it a little easier to process the
main text because you wouldn't have to worry about stuff that interrupts the
reading order, and from a philological perspective you could argue that
wherever the notes were placed in the original, they certainly were not
placed in the middle of the text.  But I'm not very confident about my
hunches in this regard, and if there is a consensus "out there" about best
practices I would much rather follow that than my own nose.

I would welcome your advice, online or offline, on this topic as well as any
information about the practices of comparable enterprises elsewhere.

With thanks in advance

Martin Mueller
Professor emeritus
Department of English and Classics
Northwestern University