The EEBO-1 TCP files are currently available in three forms, including
the two that you don't like. So are the Evans TCP files and most of
the ECCO TCP files.
1 SGML (ASCII + entities; without headers, but TEI P4 headers are
2 TCP XML (UTF8 + text strings; with TEI P4 headers attached.)
3 TEI P5 XML (UTF8 + TEI <g> elements; with P5 headers attached)
Version 3 is probably what you want. These were produced by
Sebastian Rahtz at Oxford with some feedback from yours truly.
The files were created as SGML; the two XML versions are
both programmatic derivatives. The bibliographic metadata is
stored natively as MARC (which is not publicly releasable);
the headers are derived from that.
Version 1 and version 2 live natively in Box.com as gzipped
tarballs,, and can be downloaded with just the link.
Version 3 lives natively on GitHub, with each file in a separate
repo. If you wish to download them all you have to either script
the download (with git clone or wget), or download someone
else's 'snapshot' of the git hub repos (e.g. ours). We store just
such a snapshot of the P5 version in the same Box.com site
as houses versions 1 and 2.
>> EEBO phase 1
Should you prefer to download the P5 directly from gitHub,
the commands to do so are (I believe):
>> git clone -o N00045
>> -- assuming that you're trying to download Evans file "N00045"
and so mutatis mutandis for any TCP ID number.
I can provide a list of EEBO-1, ECCO and Evans IDs, should you wish to
pursue that route.
The P5 version does not include the EEBO phase 2 files, and does not
include the 'unedited' files from ECCO TCP (those that were keyed but
ps not sure what you mean by 'reverse-engineer' in this case.
Replicate Sebastian's transform?
On Fri, Jun 5, 2015, at 16:06, Martin Mueller wrote:
> From: Eric Lease Morgan <[log in to unmask]<mailto:[log in to unmask]>>
> Reply-To: Eric Lease Morgan <[log in to unmask]<mailto:[log in to unmask]>>
> Date: Friday, June 5, 2015 at 2:54 PM
> To: "[log in to unmask]<mailto:[log in to unmask]>"
> <[log in to unmask]<mailto:[log in to unmask]>>
> Subject: eebo
> Can somebody please point me in the direction of acquiring the EEBO files
> encoded in TEI?
> I'm looking to provide interesting text mining services against the
> content found in EEBO, but I'm having a difficult time
> reverse-engineering the data I've been given. After a bit of
> investigation, I think the data I have is dated because I got it more
> than eighteen months ago and the files, while in both SGML and XML, are
> not in TEI. Instead, I have a jumble of lists with identifiers, header
> files with really basic metadata, and two sets of encoded text (one in
> SGML and the other in XML).
> Since my acquisition of the files, I believe the EEBO Phase I files have
> been released as real and true TEI. Do you know anything about this? If
> so, do you know where I get such files? Yes, my institution is a member
> of the TCP.
> Eric Lease Morgan
> Digital Initiatives Librarian
> University of Notre Dame
> Room 131, Hesburgh Libraries
> Notre Dame, IN 46556
> o: 574-631-8604
> e: [log in to unmask]<mailto:[log in to unmask]>
Paul Schaffner Digital Library Production Service
[log in to unmask] | http://www.umich.edu/~pfs/