Print

Print


*************************************************************
Re-posted from the EtextCtr and ImageLib lists
*************************************************************
SGML Text Embedded in Image Files
 
     For close to three years now the Electronic Text Center has been
producing SGML texts (tagged to the TEI Guidelines). The advent of the
World Wide Web has meant that we can provide on-line access to HTML
versions -- in our case generated from the TEI copy "on the fly".
 
     Almost as soon as we had the TEI-to-HTML conversions running, a
problem dawned on us.  A growing number of our electronic texts have
book illustrations and other book-related images along with the tagged
ASCII text, and these images carried no obvious attribution.  The solution
to the problem of unlabelled book illustrations wandering free from their
texts presented itself quite readily: the user who downloads an image file
of a book illustration or manuscript page needs to have delivered along with
it a copy of the bibliographical header -- a catalog record, a finding
aid, and a description of the production of the electronic text -- that
is at the top of every TEI text.
 
     We now achieve this by burying a version of the TEI header
into the binary code of the image itself.  The user who saves an
image from a text on our etext server now gets -- in Trojan Horse
fashion -- a tagged full-text record of the creation of that image as
part of the single image file they save. If a user has an image tool
that permits the viewing of text comments in the image file (I use
XV with XMosaic) then both image and header can be seen simultaneously,
but any program that lets you see the contents of a file is sufficient
to read the text.
 
     The text that goes into the image file does not have to be the
TEI header, of course, but a version of the TEI header is the obvious
choice as it already exists for the written text.  There are long-term
advantages to making this "text in the image file" contain clearly
delimited fields: when we have software that can search (rather than
simply view) the text contained in image files then suddenly we have
collections of images that are searchable by data field and keyword.
Even now, by keeping a copy of the image header  separate, one can
have a searchable SGML text database hypertextually linked to the
images it describes.
 
     I'm hoping that the practice of burying SGML-tagged ASCII
data in the code of an image file will become commonplace in the
electronic data communities, and would hope to see libraries,
museums, and grant-giving agencies lead the way in instituting this
process.
 
     Examples of web-accessible JPEG files that contain textheaders can
be seen in the following:
 
The illustrations in Rita Dove's "Lady Freedom Among Us" (the
University of Virginia's four-millionth volume):
 
        http://www.lib.virginia.edu/etext/fourmill.html
 
The illustrations in the University of Virginia section of Michael
Plunkett's Afro-American Sources in Virginia: A Guide to Manuscripts.
 
        http://www.virginia.edu/~press/
 
The illustrations in the following items in the British Poetry archive --
Carroll, Polwhele, Tennyson -- at
 
        http://www.lib.virginia.edu/etext/britpo/britpo.html
 
     I am appending here a sample TEI header for one of the
pages of the Rita Dove poem that is our four-millionth volume.  In
this case I have included the text that appears on the page as well -- a
stanza from Dove's poem "Lady freedom among us".  One could also
include the <figDesc> field from the <figure> tag that marks the
location of the image in the text.
 
My article "Campus Publishing in Standardized Electronic Formats --
HTML and TEI" in _Filling the Pipeline and Paying the Piper:
Proceedings of the Fourth Symposium_ (ARL Publications, 1995)
contains a longer illustrated account of this topic. E-mail
[log in to unmask] for ordering information.
 
********************************************************************
David Seaman, Coordinator        804-924-3230 (phone)
Electronic Text Center           804-924-1431 (fax)
Alderman Library                 email: [log in to unmask]
University of Virginia           http://www.lib.virginia.edu/etext/ETC.html
Charlottesville, Virginia 22903
*********************************************************************
 
 
<uva>
<imageHeader>
<fileDesc>
<titlStmt>
<title>Lady Freedom among us</title>
<resp><role>Illustrator</role>
<name>Claire van Vliet</name></resp>
<resp><role>Creation of digital image</role>
<name>University of Virginia Library Electronic Text Center</name></resp>
</titlStmt>
<pubStmt><resp><name>University of Virginia Library</name>
<role></role></resp>
<address>Charlottesville, Va.</address>
<idno type="ETC">Modern English, DovLady</idno>
<date>1994</date>
</pubStmt>
 
<srcDesc>
<biblFull>
<titlStmt><title>Lady Freedom among us</title>
<author>Rita Dove</author></titlStmt>
<pubStmt>
<resp><role>publisher</role><name>Janus Press</name></resp>
<address>West Burke, Vermont</address>
<date>[1994[, c1993</date>
</pubStmt>
<noteStmt><note></note></noteStmt>
</biblFull>
</srcDesc>
</fileDesc>
<encDesc>
<projDesc><p>Prepared by David Seaman for the University of Virginia Library
Electronic Text Center</p></projDesc>
<editDecl>
<p>This image exists as an archived TIFF image, one or more JPEG
versions for general use, and a thumbnail GIF.</p>
</editDecl></encDesc>
<profDesc>
<txtClass><keywords>24-bit color; 300 dpi</keywords></txtClass>
</profDesc>
<revDesc>
<change><date></date><resp><name></name><role></role></resp></change>
</revDesc>
</imageHeader>
 
<text id=DovLady>
<body>
<lg type="stanza">
<l>don't lower your eyes </l>
<l>or stare straight ahead to where </l>
<l>you think you ought to be going </l>
</lg>
</body>
</text>
</uva>