I am doing exactly that on a corpus of 16th century manuscripts, but I use the "n" attribute to provide the folio number. When using scanned documents, the file name is usually consistent accross all
the image files, using the document name and the page number concatenated in a more or less ad-hoc way.
In the one hand, using and entity would be cleaner because you directly specify the link to the image representation. In the other hand, by only using the folio number, you can change the physical
representation of the image of the page. I am currently using either plain JPG files or images in the "Deja Vu" format (much better for manuscripts) and can generate both HTML files according to the
client's will without changing the initial format.
You can look at the "DejaVu" version at http://www.cs.umd.edu/hcil/compus/remission/
Thinking about it, I believe the "right structural representation" of the page is its number, not a specific physical representation... but this is arguable of course.
Jean-Daniel Fekete [log in to unmask]
Invited Professor, HCIL tel: 301-405-4116
Dept of Computer Science fax: 301-405-6707
University of Maryland, College Park, MD 20742
Perry Roland wrote:
> Hello, everyone,
> I'm new here so I hope I'm not re-opening an old wound. But I have
> questions about how to tag a project with both text transcription, page
> images, and meta-data about those images using teixlite. Searching the
> archives of this list I see that some discussion of this topic took place
> several years ago. But I'm unclear whether the changes proposed then have
> taken place.
> In July of '97 a couple of people suggested modifying the pb element to
> include an entity attribute. However, in August of '97 Lou Burnard
> first held that one should always use figure elements for page images.
> However, doing so inevitably leads to tag abuse since <figure> isn't allowed
> directly within <div>s or between them. When pressed Mr. Burnard suggested
> modifications to x.globIncl. However, Michael Sperberg-McQueen suggested
> modifying x.common to include figure, table, and text elements. It doesn't
> look to me like any modifications were made to the teixlite DTD based on
> these discussions.
> So I'm still left wondering how to include page images and meta-data about
> them without contorting the markup by adding bogus <p> elements. Is there
> any agreement on this yet?
> How about the following suggestion? Add an entity attribute to the pb
> element to encode the URL (indirectly through the entity) of the image and
> an optional pgDesc element to the content of pb, i.e.
> <!ELEMENT pb (pgDesc?)>
> <!ATTLIST pb [other attributes]
> entity ENTITY #IMPLIED>
> Meta-data for the page image could be recorded in the pgDesc element
> similarly to the way it is recorded for a figure in figDesc. When no
> meta-data was needed, the pb tag could still be written as <pb />, keeping
> previously encoded instances compatible with the revised DTD.
> Am I way off base here? I'm open to suggestions, flames, etc. Can other
> folks tell me how they're handling this 'problem'?
> Perry Roland
> [log in to unmask]
> Digital Library R & D
> University of Virginia