This is absolutely on the right track: make the personal data distinct
from the references to persons in the text.
There is just one problem with what you suggest below, and you will
encounter it probably about half way into your project, which is why
it's really good that you raise the question now. The problem will occur
when you find that the same person data needs to be associated with more
than one article. You definitely don't want to duplicate the person
data -- that way lies madness -- but it can't be in two places at once.
So my suggestion would be to keep all the personal data together in one
place, rather than spread it out across the corpus. It doesn't need to
be in the same file -- you can link to it, using either the @ref or the
@key attribute as described at the start of the chapter on Names and
Dates (which I see you have very carefully read!). Putting it in one
place will also make it much easier for you to check that things are
done consistently and will also alert you to such things as two
different people who happen to have the same name much more quickly.
The counter argument will of course be that 90% of the time the people
*are* only referred to in one document, and it's a nuisance having to
maintain two documents all the time. That's a fair point, which I would
counter by suggesting that you should distinguish what's convenient when
doing data capture from what's convenient at integration time.
If I were doing this project, I think I would do just what you propose
below, but I would build into the work flow some way of periodically
harvesting the <listPerson>s being stored in the individual entries,
storing them centrally, and replacing the references to them in the
individual entries by links to the central store. That process would be
very easy to automate, since you are working in TEI XML -- and it is
essential to do it some time to support the kind of retrieval system you
are envisaging. My suggestion would enable you to get that retrieval
system delivering interesting results before you've finished collecting
all the data, which seems like a distinct advantage on a number of counts.
It would be interesting to know how other projects have approached this
-- a very common problem, I think.
Petra Vide Ogrin wrote:
> This is a multi-part message in MIME format.
> Content-Type: text/plain; charset=ISO-8859-2; format=flowed
> Content-Transfer-Encoding: 8bit
> Dear all,
> encouraged by Lou's nice words about what's "one person's basic is
> another person's ..." let me also ask for advice:
> I've just started a project of digitizing a national biographical
> lexicon (very much like ODNB). I am thinking of annotating the articles
> but at the same time extract that encoded information and place it in a
> special block in the <person> element within <listPerson> accompanying
> each article, so that the structure in the end would look like:
> <person> - encoded data - </person> (biographical entry)
> <person> - encoded data - </person> (other persons, mentioned in
> the article)
> <p> - annotated article - </p>
> (the structure then repeats for each article)
> My idea is that the retrieval system will be able to perform advanced
> searches, such as "male historians who also wrote fiction and worked in
> Maribor between 1870 and 1920".
> Is that the way to do it?
> Below is roughly an excerpt from my annotated lexicon:
> <sex value="1">moški</sex>
> <birth when="1874-03-18">
> <death when="1994-09-19">
> <residence notAfter="0974">
> <residence notBefore="1908">
> <floruit notBefore="1892" notAfter="1944"/>
> Am I on the right track with this?
> Content-Type: text/x-vcard; charset=utf-8;
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment;
> fn:Petra Vide Ogrin
> n:Vide Ogrin;Petra
> org:Slovenian Academy of Sciences and Arts;SASA Library
> adr:;;Novi trg 3;Ljubljana;;1000;Slovenia
> email;internet:[log in to unmask]
> tel;work:(01) 4706-248