Dear all,
we're starting work on a digital library where texts are scanned, OCRed and then converted to TEI - for now to level 1 of
http://www.tei-c.org/SIG/Libraries/teiinlibraries/main-driver.html#level-1-content
Quite a few texts are proceedings or similar, where the idea is to have the monograph as one TEI (+PDF of the scan), but each paper should be also documented, in addition to author/title also with a link to the PDF page where the paper starts and non-bibliographic metatada like availability, keywords, textCalss and maybe others. We're not quite sure how to encode this, so far found three ways - the first two structures are exemplified at the end of mail:
1. sourceDesc/biblStruct with (analytic+, mongr) seems the most obvious - but it doesn't have elements for e.g. availability etc. Also, I'm not sure it's actually meant to model the complete proceedings.
2. sourceDesc/listBibl/biblFulll does have them (well, most), but doesn't have the nice distinction of the monogr/analytic levels;
3. maybe teiCorpus/TEI, so that the proceedings are a corpus, giving us also <profileDesc> for keywords etc., but this seems abusive.
Right now we a leaning towards biblStruct, and would take care of elements we're missing by using ref's pointing to e.g, (one of several) availabilities in the teiHeader. Does anybody have a better suggestion or maybe a pointer to examples of best practice?
The other question concerns Dublin Core, which is right now used to encode the publication metadata. Has anybody tried to include it in the teiHeader? We are planning to map from dc terms to teiHeader elements and back again with XSLT; this will probably much depend of the particular type of publication, still, has anybody maybe done something like this already?
Any help much appreciated!
Tomaž
--
Tomaž Erjavec, http://nl.ijs.si/et/
Dept. of Knowledge Technologies, Jožef Stefan Institute, Lxjubljana
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Foo e-Proceedings</title>
</titleStmt>
<publicationStmt>
<!-- For complete proceedings because of some papers -->
<availability default="true" xml:id="restricted">
<licence>CC-NC</licence>
</availability>
<!-- For some other papers -->
<availability xml:id="free">
<p>Do what you will.</p>
</availability>
</publicationStmt>
<sourceDesc>
<!-- Option 1 -->
<biblStruct>
<analytic>
<title>First paper</title>
<ref type="licence" target="#free"/>
<ref type="firstPage" target="#p.0"></ref>
</analytic>
<analytic>
<title>Second paper</title>
<ref type="licence" target="#restricted"/>
<ref type="firstPage" target="#p.1"></ref>
</analytic>
<monogr>
<title>Foo Proceedings</title>
<imprint>
<publisher>Bar</publisher>
</imprint>
</monogr>
</biblStruct>
<!-- Option 2 -->
<listBibl>
<biblFull default="true">
<titleStmt>
<title>Foo Proceedings</title>
</titleStmt>
<publicationStmt>
<availability>
<licence>CC-NC</licence>
</availability>
</publicationStmt>
</biblFull>
<biblFull corresp="#p.1">
<titleStmt>
<title>First paper</title>
</titleStmt>
<publicationStmt>
<availability>
<p>Do what you will.</p>
</availability>
</publicationStmt>
</biblFull>
<biblFull corresp="#p.2">
<titleStmt>
<title>Second paper</title>
</titleStmt>
<publicationStmt>
<availability>
<licence>CC-NC</licence>
</availability>
</publicationStmt>
</biblFull>
</listBibl>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<ab>
<pb xml:id="p.1"/>
Text of first paper
<pb xml:id="p.2"/>
More text of first paper.
Text of second paper</ab>
</body>
</text>
</TEI>
|