Am 30.10.2014 um 14:54 schrieb Ralph Morton <[log in to unmask]>:
> Would this be compatible with the correspondence proposal as it stands, and would it allow for the searches/filtering I have mentioned?
a) "compatible with the correspondence proposal“:
the proposal is a moving target and your version is not up to date. Sounds hard, but in fact I’m pretty sure that the final (hopefully accepted) proposal will not (conceptually) diverge much from your current encoding of senders, addressees etc.
So, I’d say not to invest too much effort in this part right now but wait until <correspDesc> makes it into the Guidelines.
b) searches/filtering of date (incl. decade), author (name), company, occupation, location, gender, letter format (handwritten, typed...etc)
Yes, every bit of information you encode (and hence make explicit to a machine) can be literally retrieved with standard XPath methods. E.g. you can easily extract all forenames in your corpus from your encoded <forename>. I.e. the result in your example would be "Lord Leonard Walter“ and "E.B.“ which is semantically not quite right: „Lord“ is not a <forename> but a <roleName>. Second, it might be better to encode every forename: <roleName>Lord</roleName> <forename>Leonard</forename> <forename>Walter</forename>
On the other hand, if you’re not interested in this information (or don’t have the budget to encode in this detail), just skip the encoding of forenames. That’s ok!
Additionally, there’s information that can be computed, e.g. the decade for a given date. You decided to explicitly encode the decade as @n on <date> which is ok because you have your reasons in terms of particular software needs. That said, I’d still be in favor of not bending the TEI encoding towards software needs but transform (down convert) the TEI input to a software specific format.
> Can the <context> element be used to indicate both sent and received letter context? For example D.H. Craig who worked for New York Associated Press has authored 6 and received 4 letters in our corpus. Would it be possible to indicate which letter involving him as a participant comes next chronologically whether it was sent or received?
The context element belongs to the range of proposed elements, so use with caution ;)
But your intention seems totally in agreement with the current proposal of typing the references, e.g. <ref type=„nextLetterFromAuthor“>
> Corpus-wide Information
> Following on from that, the structure I’ve used above applies to letter-by-letter information but is it possible to have a listPerson type index of authors in the corpus as a whole? It would be good to have somewhere to collect together information on how many letters each author sent and received, and possibly biographical information and/or links to outside sources/wikis that is linked to the individual rather than the roles ‘sender’ and ‘addressee’.
Yes, definitely. You will want to have a central personography with biographical information and links to external resources. BUT, I wouldn’t put the amount of received (or sent) letters here but compute these on the fly when you need it. In my view that’s sort of an index (a value) which is to be generated from your corpus, not something to store away.
> Pragmatic information (<purpose>?)
> Finally, the letters in the BT Archive are very varied in terms of topic and the sort of thing that the author is attempting to say/do, so it can be difficult knowing where to begin making meaningful linguistic comparisons across time. To give us a way into the analysis each letter was categorised as having one of ten overarching functions. This information is to go in the header so that we can extract letters according to function and look at, for example, queriesacross time, or compare applications with offers.
> Initially we were going to use <interp> for this but the function categorisation applies to the letter as a whole rather than a span within the text. We proposed using <purpose> but Peter had warned against this. I wonder if we could use <keywords> for a second time defining the scheme as “pragmatic function”?
Yes, I warned against <purpose> because it will invalidate your TEI document when you’re not to add all the other tags that have to precede <purpose>, see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-textDesc.html
But there’s nothing wrong with <keywords>, I’d say!