Print

Print


Martin Mueller wrote:

[...]
> I thought I was very smart when I decided to give every Shakespearean
> character a corpus-wide unique ID so that a speech by Falstaff would
> always have the same who attribute   But this created problems when I
> turned the plays into a TEI corpus:  validation routines complained
> about the same id being used more than once.
[...]

But if you had indeed given each "character" (in the specific understanding
of "character" you have in mind) a corpus-wide unique ID, then you would not
get these complaints. When your corpus is assembled and validated, if more
than one element has been assigned the same ID (which is what this error
message means) then by definition the ID's are not unique. The thing to bear
in mind is that the only "uniqueness" that counts here is the structural
uniqueness of the element and its identifier. It may well be that two
different elements have a semantic reference to the same "unique" thing, but
that sense of referential uniqueness can be represented in markup only by
indirection.

What you seem to have in mind here is a sort of "abstract persona"
(henceforth "ap"), which can be embodied in more than one play under
different names. If you want to provide access to information about that ap,
the most straightforward strategy I can think of is as follows.

In the dramatis personae listing, after assigning an ID for the character
concerned as embodied in this play (making sure of course that you do not
unintentionally re-use a value that may occur in another work, the best
means of doing so being to use a work-specific component when generating the
ID)  go on to add an IDREF attribute pointing to a list of multiply-embodied
ap's (="meap"). This list, which need be no longer than the total number of
such meap's in the corpus and so need not be of "mega" dimensions, might
well be placed in the teiHeader of the Corpus container. It would assign
each
meap its unique ID and map it  to its embodiments via a list of IDREFs
(where the values would correspond to those assigned to the "embodied"
character in the various works). Your various "who" attributes would point
initially to the dramatic personae ID of the play concerned, but your
processing would then have the option of checking for an IDREF to the meap
list and following it up if appropriate. This approach would allow you to
add the ability to track the existence of such meap's with the minimum
amount of additional markup and processing. There are other strategies, but
these would I think all involve the "mega" list that you are undestandably
anxious to avoid.

I think this is a logical problem, not a DTD or schema specific one, so I
don't see how waiting for P5 would do much good, though it would offer
different ways of expressing the same sorts of solution. There will indeed
be differnt ways of pointing from one element to another, but none of them
will alter the intrinisic nature of element uniqueness in a structured
document.

Michael Beddow