Michael presented to you at least the conceptual outlines of the "real"
solution, which works by distinguishing a corpus-wide ID (useful for
distinguishing who's the "same" character) from a document-wide ID (useful
for ID/IDREF validation for referential integrity among other things).
A poor man's solution would be to keep a single master list of characters
with their IDs, and include that via an entity in each of your instances
for validation purposes.
Then if you include it a single time in the instance representing your
corpus, you won't get the ID clashes. (Some purists might prefer a
hyperlinking solution for this over using document fragments via entity
references, and Syd reminds us that P5 allows a URI, which suggests a
hyperlink. But then you probably have to go outside DTD ID/IDREF for
validation. If you decide to go this way, Many Things Are Possible.)
Such a master list would constitute, of course, a kind of corpus-level
metadata, and could be used to record also which characters appear in which
plays (if querying the data isn't good enough for you).
At 11:19 AM 12/16/2004, you wrote:
>I'd be grateful for advice on how to handle role id's in a set of
>plays, such as Shakespeare's. The "same" character appears in several
>of Shakespeare's plays, e.g. Octavius Caesar, Hotspur, Northumberland,
>Hal, and Falstaff. The same character may have different names: Prince
>John of Lancaster in 2Henry IV is the Duke of Bedford in Henry V and
>Henry VI. There are some characters, such as the Duke of Venice in
>Othello and The Merchant of Venice that one might want to think of as
>the "same" character.
>I thought I was very smart when I decided to give every Shakespearean
>character a corpus-wide unique ID so that a speech by Falstaff would
>always have the same who attribute But this created problems when I
>turned the plays into a TEI corpus: validation routines complained
>about the same id being used more than once.
>For my immediate purposes I invented a kludge that wiped out the
>corpus-wide ID's, but I do want to preserve in the encoding the
>information that character x in play A is the same as character y in
>What are my best options? I could simply ignore the validation problem
>since it involves known 'errors'. This lets users retrieve all words
>associated with a given who attribute. Acting students might be
>usefully surprised to learn that their character has a life elsewhere.
>Alternately, I could keep a unique role ID and add a key attribute to
>the role element pointing to some virtual mega castlist. That seems
>like a lot of overhead.
>I understand that the ID mechanism will work differently in P5, but I'm
>not quite sure how. Will P5 allow for more graceful solutions to this
>kind of problem? In which case I might just wait.
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML