On Mon, Jun 8, 2015, at 07:56, Eric Lease Morgan wrote:
> > On Jun 7, 2015, at 2:15 PM, Sebastian Rahtz <[log in to unmask]> wrote:
> > I don’t want to stop you having fun, Eric, but some of this is already done for you. From
> > https://github.com/textcreationpartnership/Texts
> > you can download CSV and JSON versions of the whole metadata catalogue, and http://ota.ox.ac.uk/tcp/ has a browsable sortable HTML table with all that data. Perhaps not all the data categories you want. The ability of jQuery datatables to cope with 60000 rows of data is rather awesome.
> The CVS file available from GitHub is very much like the one I created,
> and I have a question about the identifiers it contains. More
> specifically, is it possible to embed any of those identifiers (TCP,
> EEBO, VID, STC) into an actionable URL and get something meaningful back.
> Do they point to various incarnations of the texts? —Eric M., University
> of Notre Dame
The identifiers that we associate with EEBO TCP texts all mean
some are machine-actionable and some are not.
1. TCP ID
The TCP identifier identifies the TCP file in all its incarnations.
TCP ID of A69506, the original SGML version will be called A69506.sgm,
the P4 XML with header will be called A69506.headed.xml, and so forth.
Those versions of the files have not been mounted for individual
though if you had downloaded them all, this ID is the way to retrieve
from your local repository.
Sebastian has indicated how to use the TCP ID in a URL to get his files.
(And I previously indicated how to use it in a script to download
of the TCP P5 files.)
It may also be used to access the online versions of the files on the
TCP sites at Michigan and Oxford. Given a TCP ID A69506, this
URL will access the top-level page at the Michigan site:
and this URL will fetch the corresponding page on the Oxford Digital
Both of these pages are HTML generated in the traditional way from
[One used to be able to pull down the corresponding page at the
PhiloLogic site (Chicago/Northwestern)
using just the TCP ID, but I'm not sure if that is still true. At least,
I can't immediately see
how to do it if it is.]
2. The 'VID' is an image-set identifier for ProQuest's EEBO product. If
is an EEBO member, and your VID is 94927, then the URL for image 1 of
or (in an old-fashioned shorthand that still mostly works):
Every EEBO page image may be uniquely identified thus by a combination
VID and sequence number ("94927/1").
3. The EEBO ID (which in some of our metadata is called a "BIBNO") is
the ProQuest ID for the bibliographic item (i.e. for the catalog record
the title, as opposed to the image set for the copy). Give an EEBO ID
of 12880700, this URL:
fetches the bibliographic record for this item from the ProQuest EEBO
In many cases (but not all) these numbers are actually OCLC accession
therefore may be used in OCLC (e.g. in OCLC Connexion) to retrieve the
of the same record.
4. The STC numbers (sub-typed as Wing, Pollard&Redgrave, Evans, ESTC,
etc.) are all
forms of short-title catalog or (in the case of ESTC) full record
catalog. Most of
these are only human-actionable, unless you happen to have a