Hello again,

@Peter, yes, I gave it a thought last night and I think you are right: I should create a separate element for each person and for each mention probably - I think that’s how it works in RDF. My approach here was to make it simpler to write by hand, but I guess I could automatize some extractions from TEI files. Another person offlist suggested that I should not use <relation> but create a an element ad hoc for my project and only use <relation> for social or contextual relationships.

@Grace, yes, you are totally right. There are practical reasons behind my decision: it is a bit unusual, but I can’t give access to full texts because of copy right reasons in most of the cases. However, if I had all texts in digital format licensed with Creative Commons, I still think it would be enough for my research question to record this kind of information - that is, instead of encoding the text (which is a lot of work), extract only some “facts” contained in the source and represent them as metadata. I could use RDF and linked data but I wanted to try TEI as a baseline - encoding in XML/TEI and then transforming to RDF - because I am used to work with Oxygen and validate with Schematron. (Side note: is there any equivalent of Oxygen for linked data?). Ciotti and Tomasi’s article are a good starting point for this kind of project. I also like Øyvind Eide’s article (https://journals.openedition.org/jtei/1191) and Vogeler’s “The Assertive Edition” (https://hcommons.org/deposits/item/hc:21373/). 

If anyone has more suggestions/ feedback, do not hesitate to share. I am just starting to think on how to model this kind of information and I am happy to learn more and/or change my approach.

Thanks again both of you.


Dr. Antonio Rojas Castro
Post-doctoral Researcher, Cologne Center for eHumanities
Editor, The Programming Historian en español

De: Peter Boot <[log in to unmask]>
Responder: Peter Boot <[log in to unmask]>
Filtrar por fecha: 30 de enero de 2019 at 9:16:44
Para: Antonio Rojas Castro <[log in to unmask]>, [log in to unmask] <[log in to unmask]>
Asunto:  RE: [TEI-L] How to encode the number of mentions in a document using ?

Hello Antonio,


My impression is that you are trying to squeeze information into structures that weren’t meant to handle this type of information. Why don’t you just create an element mentionCount with attributes @author, @source, @mentionedPerson and @count? Each attribute can have the proper datatype, you could, if you wish, write schematron rules to check whether the attributes point to elements of the correct type, etc. Or, if you want to stick with relation, add a proper integer attribute to hold the count.


Another aspect of your encoding which seems strange to me is that (if I understand correctly) you store information about multiple persons being mentioned in a single relation element, just because the number of mentions happens to have the same value. But these are independent information items. It makes reading your code, as well as processing it, much harder. I would use a separate element for each person.


Good luck,



From: TEI (Text Encoding Initiative) public discussion list <[log in to unmask]> On Behalf Of Antonio Rojas Castro
Sent: dinsdag 29 januari 2019 5:35
To: [log in to unmask]
Subject: [TEI-L] How to encode the number of mentions in a document using <relation>?


Hello List,


I am encoding information derived from letters - rather than encoding the texts themselves. 


I am interested in representing mentions and how many times the author of the letter is mentioning each named entity. I am currently using <relation> to encode these pieces of information along with <person>:


<relation active="#person_8726289" name="mentions" passive="#person_000001 #person_000002 #person_000005 #person_66462281 #person_66806872" ana="1" source="#carta_es_0001"/>

<relation active="#person_8726289" name="mentions" passive="#person_66806872" ana="2" source="#carta_es_0001”/>


In this case, both <relations> have the same author recorded with @active and several “targets” recorded with @passive. In the first element <relation> those people were mentioned only once - thus I am storing that information using @ana. In the second element the author mentions one of these authorities (they are mostly Latin authors) twice - thus, I used @ana=“2”. Both elements have the same source (@source=“carta_es_0001”) because they are “facts” contained in the same letter. However, I am not very happy with the use of @ana to represent the frequency or the weight of the relation. 


Does anyone know an alternative or can share her/his experience/opinion?


(I do not like either using @name to store action verbs like mentions or quotes, but this is the closest way I found in the TEI to emulate RDF or standoff markup).


Thank you for your feedback. 




Dr. Antonio Rojas Castro
Post-doctoral Researcher, Cologne Center for eHumanities

Editor, The Programming Historian en español