I don't have any criticisms or comments specifically about the markup
you propose; as far as I know (and by no means am I as expert as some
other readers of this list) it's fine. Yet I also suspect it's just
the tip of the iceberg.
So I hope I can be forgiven for making a heretical suggestion, namely
that you not try to produce TEI directly in the first step of your
conversion, but instead introduce a project-specific markup schema to
capture the intellectual content of your data, convert your
bibliography into that (where it can be validated and normalized), and
then convert from there into TEI in order to get the benefits of a
(I suggest this because it is apparent to me that from the starting
point -- your Word documents -- you have valuable intellectual content
already "encoded". Were you starting to write your bibliography from
scratch, you wouldn't have this issue and could go straight to TEI.)
While this may seem like extra work, it will have the virtue of
exposing in an explicit and tractable form the problem of mapping
between the structures implicit in your Word file, and TEI constructs.
It seems to be this will be helpful (and maybe even practically
necessary) in that it offers a place for you to distinguish between
issues that are dealt with best by correcting the data itself, and
issues of how to express in TEI the information you have captured.
Inasmuch as this will reduce thrashing (having to revisit decisions
made with incomplete information), it is quite likely to save work in
the long run.
I dare say that if you are fortunate -- if the designers of your
bibliography in Word have done a good job -- such a format will be
straightforward and fairly simple (at any rate, simpler than TEI,
which is designed to allow describing many things that you do not
have), and that creating TEI from it (with a transformation) will be a
much more manageable problem than creating good (semantically strong)
TEI from the raw data.
Even if it's not, it will show in dramatic form (unobscured by TEI
modeling issues) where the design choices and tradeoffs are in the
organization of your bibliography.
Consequently, the TEI that you ultimately produce will be better
(probably much better), in that its tagging will better reflect
modeling decisions made on a policy level, rather than ad-hoc attempts
to intrepret the source using everything the TEI makes available.
And in the worst case, you will discover fairly early in the modeling
process that things are so inconsistent in your source data (there is
no latent model to be found there), you might as well treat it as
source materials for a new bibliography in TEI rather than as a data
It's true that to implement such a migration, you will need some
expertise in designing and building XML schemas and transformations.
But those are very useful skills that will also serve you well with
It will be interesting to see objections and reservations to this
idea, which are bound to be informative.
On Wed, Dec 19, 2012 at 11:18 AM, Örn Hrafnkelsson <[log in to unmask]> wrote:
> Dear TEI specialists
> I need some help or advice.
> We at the National and University Library of Iceland are starting a new project in our library in 2013 and I am preparing the work.
> We plan to convert a 400 pages WORD document which contains the manuscript of the Icelandic bibliography from 1534 to 1844 to TEI P5 and making some sense into it.
> And I am having some troubles encoding the text in to something sensible and I am wondering if you could give me some advice.
> Or send me some manuals where people has been doing something similar or sites where bibliographies are made available on the internet.
> Here below is an short example of one record from the bibliography, the first book. It is a spelling-book or a primer from 1745. The first line is the title-page of the book (yellow), then the extent (green). Then in smaller size five points that I have trouble encoding (marked in square brackets, blue) and I have for the moment put in <note>.
> A | – | Selst Alment In̄bunded 2. Fiskum. | – | Þrikt a Hoolum i Hiallta-Dal, af | Halldore Erikssyne, Anno 1745. ~ Ark. A-C2. (52) bls. 12°.
>  Stafrófskver.  – Upphafsstafur á titils. - A - með miklu flúri. –  „S Min̄e CatechisMVS. Med Utleggingu D. Mart. Luth.“ A5a-B7b. –  Af tveimur öftustu blöðum bókarinnar eru til tvö afbrigði frá algengustu gerð með frábrugðnu sátri, en sama texta. –  BiblNot. IV, 139; HHCat. II, 1.
> What I have put in note is this and I’m not sure about what I’m doing there.
>  Stafrófskver = A kind of an added title
>  – Upphafsstafur á titils. - A - með miklu flúri. = Decorative note
>  „S Min̄e CatechisMVS. Med Utleggingu D. Mart. Luth.“ A5a-B7b. = A title or text within the book by Martin Luther
>  Af tveimur öftustu blöðum bókarinnar eru til tvö afbrigði frá algengustu gerð með frábrugðnu sátri, en sama texta. = Note about different versions
>  BiblNot. IV, 139; HHCat. II, 1. = Source of information about this publication.
> Here is my attempt to convert this into TEI XML. Can you please look at it and advise me.
> <!-- biblStruct -->
> <biblStruct xml:id="A001745a">
> <monogr xml:lang="is">
> <title type="desc" level="m">A <lb/> – <lb/> Selst Alment In̄bunded 2.
> Fiskum. <lb/> – <lb/> Þrikt a Hoolum i Hiallta-Dal, af <lb/>
> Halldore Erikssyne, Anno 1745.</title>
> <edition n="1"/>
> <pubPlace key="HólFlj01">Hólar í Hjaltadal</pubPlace>
> <date when="1745" cert="high"/>
> <name type="printer" key="HalEir003">
> <forename sort="1">Halldór</forename>
> <surname sort="2">Eiríksson</surname>
> <measure type="folios">A-C<hi rend="subscript">2</hi>.</measure>
> <measure type="pages">52</measure>
> <measure type="size">12°</measure>
> <p>Upphafsstafur á titils. - A - með miklu flúri.</p>
> <p>Af tveimur öftustu blöðum bókarinnar eru til tvö afbrigði frá
> algengustu gerð með frábrugðnu sátri, en sama texta.</p>
> <!-- <analytic>
> <author>Martin Luther</author>
> <title>S Min̄e CatechisMVS. Med Utleggingu D. Mart. Luth.</title>
> <title>Bibliographical notices</title>
> <biblScope type="vol">IV</biblScope>
> <biblScope type="pp">139</biblScope>
> <name type="person" key="HalHer001">Halldór Hermannsson</name>
> <title>Catalogue of the Icelandic collection bequeathed by Willard
> <biblScope type="vol">II</biblScope>
> <biblScope type="pp">1</biblScope>
> Örn Hrafnkelsson
> Director for National Collections and Digital Conversion
> Email: [log in to unmask]
> Tel.: +354-525-5631
> Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík
> Sími/Tel: +354 5255600 | www.landsbokasafn.is
> fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables