> Thank you for the explanation. Is there an authoritative document on
> the EEBO process which can be linked to from the <encodingDesc> ?
Aside from the keying guidelines and their chaotic supplements
here: http://www.textcreationpartnership.org/docs/ .. probably not.
And even those only cover the transcriptional capture, saying nothing
about the source of the bibliographic information, the storage of
metadata, the various conversions to XML, etc. They do, however,
include a list of SDATA character entities and their recommended
equivalents for purposes of effective display and (alternatively)
lossless round-tripping. Maybe when we finish this project, we
will know enough to be able to say what we did, as well as what
we did wrong. ... The details of Sebastian's conversions to P5
are bound up in his ant file and style sheets, which should perhaps
also be linked to at least indirectly.
> I think I interpreted the placing of the documents on github as an
> invitation to improve them, or at least suggest ways in which they
> could be improved.
Yes indeed. Thank you. Didn't mean to suggest otherwise.
In fact, I have no personal stake in the P5 headers at all,
merely feeling defensive about the shortcomings of the source
on which they draw. Improvements welcome.
* * *
> In most presentational scenarios, xml:ids end up as the anchors to
> which systems and end users can link. Introduction of ids (in both the
> body and the header) would seem to encourage the creation and
> persistence of reliable anchors for fine-grained linking and analysis.
> To this end, I'd have put xml:ids on at least all <div>s and all
> free-standing <p>s (<p>s not the direct descendant of a <div>).
Right, you want div-level IDs. Not unreasonable, I should think,
though maybe something difficult to coordinate among different
users with very different ideas of granularity. Some, for example,
are already adding IDs at the word-token level. Others are interested
only in drama (IDs on <sp>) or verse (on <l> and <lg>) etc.
Hard to know how to please them all. I think it fair to say that we
didn't put them in to begin with (even in the SGML) for the same
reason that we didn't put them into our (Michigan's) earlier text
projects like American Verse and the Corpus of Middle English, or
even (below the entry level) into the Middle English Dictionary,
namely that we operated within a tradition of light markup,
not all that far removed from the traditional library ideal of bulk
and unbiased presentation, or even the usual ignorant appeal
for 'plain text.' Adding IDs as hooks isn't in the same category as
interpretive markup, but it is ... messy, and the sort of thing we
traditionally left to after-market providers.
> I'm initially interested primarily [in the] header. A quick survey of a handful
> of documents suggest a relatively small set of the names used quite
> frequently that could be targets for intervention.
The only names I can think of that could be described as forming
a small set are those of the parties responsible for the keying,
editing, and publication of the e-texts (on the order of 100 people).
I take it you don't mean to include authors, publishers, etc.
of the original books? Which amount to an authority list for all of
early English print.
> Is there a TCP bestiary of unusual documents and corner-cases to test
> ones' assumptions against?
A sort of sampler of everything likely to appear anywhere? Including
all the nasty exceptions that are more than likely mistakes?
No. But I could think about how one might create such a thing.
Paul Schaffner Digital Library Production Service
[log in to unmask] | http://www.umich.edu/~pfs/