> I was just looking for a way to consistently mark the information about the
> media type of the object that the free-standing header "is about". I found
> the <extent> idea pretty sneaky, and far from ideal. It could be more
> palatable, if I were to write a header for an electronic version of the text
> "as such", and then I would, in the <extent>, say that, for example, the PDF
> file takes up this much space, whereas the TXT version takes up some
> different amount.
> But I miss the way to state, somehow, and consistently, that "this header
> here records the formal metadata of that file over there, and moreover that
> file's media type is "xxx/yyy".
Yes I see you point. TEI has no sematically clear way to do it. The
Header (even the good old standalone Header) is thought to document a
TEI encoded text and eventually its source, and is not a general
purpose metadata schema for electronic resources. That is why I would
use MODS or PREMIS for that (and you could mix them with TEI Header
inside a METS wrapper).
>> I think you could as well define the pdf to be the source of your TEI
>> document, couldn’t you?
> But could I? The way I understand the architecture in this case is that I'm
> writing about the electronic version, the source of which is a printed book
> that got scanned. So I would expect <sourceDesc> to specify the info on the
> printed book, while everything else to tell the story about what the
> electronic version is, how it came about, who's financed it, etc. (And,
> possibly, about what it's media type is, because it will not always be
> readable from the filename extension). I could be getting it wrong though.
This would be quite ab-using the TEI <sourceDesc>, where you should
put info relating the source of an electronic edition inside the TEI
doc. It could even be a pdf file, not necessarily a printed book, but
there must be some content coming from it into the TEI doc.
> Your second point might be helpful to me. I didn't consider using
> <facsimile>, because I have always considered the act of enabling that
> element as a kind of commitment to provide cross-element linking, between
> the transcribed text, and the facsimile.
> But if it could be accepted as a standard practice, for free-standing
> headers of binary objects (errm, but for plain-text object as well, to be
> consistent??) that they use <facsimile> only to point to the described
> object, I'd be happy to consider that.
It is still a trick but it can work. In some way it is mimicking the
structure of METS
mets:dmdSec -> TEIHeader
mets:fileGrp -> TEI/facsmile
But again in <facsimile> semantic there is the notion that the image
is content of the TEI doc, while in your case the PDF is a completely
I think the better solution would be using <relatedItem target="URI">.
Its desc says "contains or references some other bibliographic item
which is related to the present one in some specified manner, for
example as a constituent or alternative version" which is the IMHO
encompasses the idea of an external digital object whose content is
somehow related to that of the TEI doc and that can share the same
descriptive metadata in the TEIHeader. Actually it must be wrapped
inside <noteStmt> but I would strongly support a feature request to
move it inside <fileDesc>.