Print

Print


Hello Antonio,

My impression is that you are trying to squeeze information into structures that weren’t meant to handle this type of information. Why don’t you just create an element mentionCount with attributes @author, @source, @mentionedPerson and @count? Each attribute can have the proper datatype, you could, if you wish, write schematron rules to check whether the attributes point to elements of the correct type, etc. Or, if you want to stick with relation, add a proper integer attribute to hold the count.

Another aspect of your encoding which seems strange to me is that (if I understand correctly) you store information about multiple persons being mentioned in a single relation element, just because the number of mentions happens to have the same value. But these are independent information items. It makes reading your code, as well as processing it, much harder. I would use a separate element for each person.

Good luck,
Peter

From: TEI (Text Encoding Initiative) public discussion list <[log in to unmask]> On Behalf Of Antonio Rojas Castro
Sent: dinsdag 29 januari 2019 5:35
To: [log in to unmask]
Subject: [TEI-L] How to encode the number of mentions in a document using <relation>?

Hello List,

I am encoding information derived from letters - rather than encoding the texts themselves.

I am interested in representing mentions and how many times the author of the letter is mentioning each named entity. I am currently using <relation> to encode these pieces of information along with <person>:


<relation active="#person_8726289" name="mentions" passive="#person_000001 #person_000002 #person_000005 #person_66462281 #person_66806872" ana="1" source="#carta_es_0001"/>

<relation active="#person_8726289" name="mentions" passive="#person_66806872" ana="2" source="#carta_es_0001”/>



In this case, both <relations> have the same author recorded with @active and several “targets” recorded with @passive. In the first element <relation> those people were mentioned only once - thus I am storing that information using @ana. In the second element the author mentions one of these authorities (they are mostly Latin authors) twice - thus, I used @ana=“2”. Both elements have the same source (@source=“carta_es_0001”) because they are “facts” contained in the same letter. However, I am not very happy with the use of @ana to represent the frequency or the weight of the relation.



Does anyone know an alternative or can share her/his experience/opinion?



(I do not like either using @name to store action verbs like mentions or quotes, but this is the closest way I found in the TEI to emulate RDF or standoff markup).



Thank you for your feedback.



Best,

--
​Dr. Antonio Rojas Castro
Post-doctoral Researcher, Cologne Center for eHumanities
Editor, The Programming Historian en español
<http://www.antoniorojascastro.com<http://www.antoniorojascastro.com/>>