Dear Christopher,
We had the same problem for a digitization project of critical edition
(the nature of the source is different, but the principle is the same).
We have on file JPG per page and we have a XML TEI file for a book. We
encode the logical structure in TEI file and we just use <pb/> element
to indicate the beginning of a page. To resolve this problem, we use
METS schema (http://www.loc.gov/standards/mets/). So we have two files :
- The METS files allows us to describe the physical structure of the book
- The TEI file allows us to describe the logical structure of the book
In METS file, there is the structural map section, for each page of the
book, we have a section :
<METS:div LABEL="page 64" TYPE="acte">
<!-- A pointer to the file in JPG -->
<METS:fptr FILEID="larochejpg_0100"/>
<!-- A pointer to the file in TIFF
<METS:fptr FILEID="larochetiff_0100"/>
<!-- A pointer to the file in XML TEI -->
<METS:fptr FILEID="larochetxt">
<!--With the precision in COORDS attribute the
number what is indicated in the n attribute in <pb/> element in TEI XML
file -->
<METS:area FILEID="larochetxt" COORDS="64"/>
</METS:fptr>
</METS:div>
In TEI file, we have this encoding :
<div type="acte">
<p>" Odo, Dei gratia Parisiensis episcopus, omnibus ad
quos littere presentes
pervenerint salutem, in Domino. Noverint universi
quod constitutus in nostra
presencia Guido de <hi
rend="italiques">Levies,</hi> miles, laudante et
concedente <pb n="4"/>Guiburge[...]</p>
</div>
Next, we make the correspondance between METS file and TEI file with
XSL. You can look the Website what uses this system. For instance, this
page contains a deeds in text mode generated with TEI file :
http://elec.enc.sorbonne.fr/cartulaires/laroche/acte50/ On the left of
the page, you see a box and you can go to the page in image mode (so
with physical structure) and this page
http://elec.enc.sorbonne.fr/cartulaires/laroche/page53/ contains two
deeds and you can go to the logical structure.
XML files are available for instance :
- METS file :
http://elec.enc.sorbonne.fr/logix/ouvrages/cartulaires/laroche/larochemets.xml
- TEI file :
http://elec.enc.sorbonne.fr/logix/ouvrages/cartulaires/laroche/larochetxt.xml
At this moment, we test this system for a transcription project with the
digitization of the manuscript (it's a cartulary) and that works fine. I
can't give you the URL, because the project is not totally finished.
I hope my explanations are understable, I can try to explain you better
if you are interresting.
Best wishes
Gautier Poupeau
|