Print

Print


Just brief reply, since my expertise on SAWS has already been exceeded 
:-) but I believe pretty strongly that SAWS's use of `<relation/>` to 
encode relationships between passages of text is *not* tag abuse, nor is 
it unlicensed by the TEI Guidelines. I do think (and if I remember 
corrected Council were in agreement) that the Guidelines need more 
explicitly to say so. I believe there's even an example in the pipeline 
to do just that. I'll see if I can find it.

Best,

Gabby

On 2013-11-08 10:50, Christian Chiarcos wrote:
> Having found a preliminary representation of cross references between
> segments, I would like to come back to Gabriels suggestion with greater
> level of detail, and there are a few questions/observations.
>
> Based on our current representation (resp., a close rendeing in TEI), I
> would like to create an interface that allows researchers to refine both
> the scope of correspondences and their type. At this point, it would be
> interesting to experiment with the SAWS modelling. Unfortunately, parts
> of their site seem to be down (http://www.ancientwisdoms.ac.uk/media/,
> there is a demo, but links to the texts won't work), and in particular,
> I haven't found XML data.
>
> Something else that I found problematic was their use of relation. If I
> read the RelaxNG schema correctly, relation is possible only for names
> (listNym), organizations (listOrg), events (listEvent), persons
> (listPerson), places (listPlace), or Named Entities in general
> (listRelation). The only possible way I see to express the linking of
> text segments using relation is to introduce events in a somewhat
> abusive way:
>
> <sp>
>      <listEvent>
>          <event>
>              <p>my line</p>
>          </event>
>          <relation name="requiredName!" ref="saws:someRelation..."
> active="participant!"/>
>      </listEvent>
> </sp>
>
> Three issues here:
> - sp is "an individual speech in a performance text" (not really
> applicable to written text, but the only way I see to create an event
> under a div element)
> - relation requires a @name (that the example under
> http://www.ancientwisdoms.ac.uk/media/ontology/SAWS_relationship_types.html
> omits)
> - @active "identifies the active participants in a non-mutual
> relationship, or all the participants in a mutual one", but is used in
> SAWS to indicate the referred segment
>
> Maybe I missed something, though.
>
> Best,
> Christian
>
> On Thu, 07 Nov 2013 14:41:21 +0100, Christian Chiarcos
> <[log in to unmask]> wrote:
>
>> Dear Gabriel,
>>
>> thank you for pointing that out. I think their "isRelatedTo" and its
>> subproperties (especially isCloseRenderingOf and isLooseRenderingOf)
>> would be well-suited. I'm still hesitating, though, because if I
>> understand it correctly, it requires one additional element with at
>> least two attributes for every verse I'd like to address, adding 3*r
>> nodes to the document (with r being the number of cross-references).
>>
>> <relation ref="saws:isCloseTranslationOf" active="#div1.i001"/>
>>
>> As soon as multiple types of relations are to be distinguished (not
>> yet, I cannot tell them apart automatically), this seems to be the
>> solution. Until then, something more compact would be preferrable.
>>
>> Best,
>> Christian
>>
>> On Thu, 07 Nov 2013 12:01:51 +0100, Gabriel Bodard
>> <[log in to unmask]> wrote:
>>
>>> Dear Christian,
>>>
>>> I can't speak in great detail to the TEI markup and data model you
>>> suggest below, but it occurs to me that there might be parallels
>>> between and value in exploring compatibility with the markup devised
>>> for similar purposes by the Sharing Ancient Wisdoms (SAWS) project:
>>> see http://www.ancientwisdoms.ac.uk/
>>>
>>> One of the aims of this project is to encode multiple mediaeval and
>>> ancient texts, some of which are collections of fragments of earlier
>>> texts, align them to various translations (close or loose) and to
>>> other texts of which they various segments might be copies,
>>> paraphrases, translations, or merely influenced by.
>>>
>>> To this end they used (1) CTS URNs (as URIs) for all texts and
>>> segments of texts, enabling pointing in both directions with minimal
>>> overhead in terms of intervention and insertion of ids in the text;
>>> (2) an ontology of text object and relationship types, described at
>>> http://www.ancientwisdoms.ac.uk/media/ontology/SAWS_relationship_types.html
>>> (probably overkill for your purposes, but a minimal subset of it
>>> would be easy to devise); (3) a series of `tei:relation` elements to
>>> define the relationships between texts, places, persons, and other
>>> objects in the corpus.
>>>
>>> I'm not involved in either project, but as a glance it seems to me
>>> that a model along these lines might well work for the issues you are
>>> describing too. If you're interested in more information, I believe
>>> one or two of the SAWS developers are on this list (and they can
>>> probably correct some of my comments above, too).
>>>
>>> Regards,
>>>
>>> Gabby
>>>
>>> On 2013-11-07 10:02, Christian Chiarcos wrote:
>>>> Dear list members,
>>>>
>>>> I am currently working on a massive corpus of verse-aligned religious
>>>> texts (Bibles, mostly, but also Qur'an editions) for linguistic and NLP
>>>> purposes. In the beginning, I've been adapting the CES specifications
>>>> Philipp Resnik developed decades ago for a similar, small-scale project
>>>> (in XML, not his SGML, of course). As we have outgrown the scale of his
>>>> project by lengths, it is about time to update our format to a more
>>>> recent standard, and TEI might be the format of choice.
>>>>
>>>> Yet, there are certain aspects specific to a parallel corpus of bibles,
>>>> and I was wondering how to represent them with TEI:
>>>>
>>>> - All bibles share the same set of verse identifiers, but occasionally,
>>>> a set of verses is not translated literally, but loosely translated
>>>> within a larger segment. We introduced an additional attribute altid
>>>> (alternate id), a sequence of NMTOKENS, each of which represents a
>>>> regular bible ID (we did not chose IDREFS because they are not defined
>>>> within the document). What would be the most efficient way to represent
>>>> this properly?
>>>>
>>>> e.g. a multi-verse segment from a Low German (Westphalian) bible (in
>>>> our
>>>> CES-adaptation):
>>>>
>>>> <seg altid="b.MAT.17.22 b.MAT.17.23">
>>>>      Os soe sik in Galiläa uphoelen, sia Jesus: Doe Minskensuone
>>>>      sall baule den Hännen fan den Minsken iutliewert weren. Soe
>>>>      weret en dautmaken, owwer am drüdden Dage sall hoe wir upston.
>>>>      Do woören soe olle bedroöwet.
>>>> </seg>
>>>>
>>>> vs. a verse segment in another Low German (Plautdietsch) bible
>>>>
>>>> <seg id="b.MAT.17.22" type="verse">
>>>>      Aus see enn Galilaea eromm jinje, saed Jesus to an: "De
>>>>      Menschesaen woat boolt enn Mensche aeare Henj jejaeft woare,
>>>> </seg>
>>>> <seg id="b.MAT.17.23" type="verse">
>>>>      en dee woare am doot moake, oba aum drede Dach woat hee fomm
>>>>      Doot oppstone." En siene Jinja weare seeha truarich do aewa.
>>>> </seg>
>>>>
>>>> We query with XQuery across all bibles for a verse ID to compare
>>>> differences across languages and language stages. The altids are
>>>> inspected if a seg with the corresponding ID isn't found.
>>>>
>>>> - Not only seg, but also div elements may carry the altid attribute,
>>>> e.g., for non-literal poetic bible adaptations where we have
>>>> chapter- or
>>>> book-level alignment only, but where smaller structures (e.g., l)
>>>> exist.
>>>>
>>>> - altid also comes in handy if we want to mark cross-references to
>>>> other
>>>> bible passages that contain literal repetitions, e.g. (from the 1611
>>>> King James Version):
>>>>
>>>> <seg id="b.EXO.20.12" altid="b.DEU.5.16" type="verse">
>>>>      Honour thy father and thy mother: that thy dayes may bee long
>>>>      vpon the land, which the Lord thy God giueth thee.
>>>> </seg>
>>>>
>>>> <seg id="b.DEU.5.16" altid="b.EXO.20.12" type="verse">
>>>>      Honour thy father and thy mother, as the Lord thy God hath
>>>>      commanded thee, that thy daies may be prolonged, and that it
>>>>      may goe well with thee, in the land which the Lord thy God
>>>>      giueth thee.
>>>> </seg>
>>>>
>>>> With our querying strategy, these altids will be relevant if we want to
>>>> retrieve matches from a Bible where the exact verse is lost, but a
>>>> near-analogon is found, nevertheless. This specific verse is, for
>>>> example, also quoted several times in the New Testament, and for
>>>> languages with an NT only, we would like to have these matches if we
>>>> query for b.EXO.20.12 or b.DEU.5.16.
>>>>
>>>> In TEI, the id would correspond to an xml:id, but what would be a good
>>>> strategy to preserve the altid information without creating a large
>>>> overhead (as using the index element would entail) ?
>>>>
>>>> Thanks a lot,
>>>> Christian Chiarcos
>>>
>>
>>
>
>

-- 
Dr Gabriel BODARD
Researcher in Digital Epigraphy

Digital Humanities
King's College London
Boris Karloff Building
26-29 Drury Lane
London WC2B 5RL

T: +44 (0)20 7848 1388
E: [log in to unmask]

http://www.digitalclassicist.org/
http://www.currentepigraphy.org/