I'd be grateful for advice on how to handle role id's in a set of
plays, such as Shakespeare's.  The "same" character appears in several
of Shakespeare's plays, e.g. Octavius Caesar, Hotspur, Northumberland,
Hal, and Falstaff.  The same character may have different names: Prince
John of Lancaster in 2Henry IV is the Duke of Bedford in Henry V and
Henry VI.  There are some characters, such as the Duke of Venice in
Othello and The Merchant of Venice that one might want to think of as
the "same"  character.

I thought I was very smart when I decided to give every Shakespearean
character a corpus-wide unique ID so that a speech by Falstaff would
always have the same who attribute   But this created problems when I
turned the plays into a TEI corpus:  validation routines complained
about the same id being used more than once.

For my immediate purposes I invented a kludge that wiped out the
corpus-wide ID's, but I do want to preserve in the encoding the
information that character x in play A is the same as character y in
play B.

What are my best options? I could simply ignore the validation problem
since it involves known 'errors'. This lets users retrieve all words
associated with a given who attribute. Acting students might be
usefully surprised to learn that their character has a life elsewhere.
Alternately, I could keep a unique role ID and add a key attribute to
the role element pointing to some virtual mega castlist.  That seems
like a lot of overhead.

I understand that the ID mechanism will work differently in P5, but I'm
not quite sure how. Will P5 allow for more graceful solutions to this
kind of problem? In which case I might just wait.

Martin Mueller
Professor of English and Classics
Department of English
Northwestern University
Evanston, Illinois 60208
[log in to unmask]