Just brief reply, since my expertise on SAWS has already been exceeded
:-) but I believe pretty strongly that SAWS's use of `<relation/>` to
encode relationships between passages of text is *not* tag abuse, nor is
it unlicensed by the TEI Guidelines. I do think (and if I remember
corrected Council were in agreement) that the Guidelines need more
explicitly to say so. I believe there's even an example in the pipeline
to do just that. I'll see if I can find it.
On 2013-11-08 10:50, Christian Chiarcos wrote:
> Having found a preliminary representation of cross references between
> segments, I would like to come back to Gabriels suggestion with greater
> level of detail, and there are a few questions/observations.
> Based on our current representation (resp., a close rendeing in TEI), I
> would like to create an interface that allows researchers to refine both
> the scope of correspondences and their type. At this point, it would be
> interesting to experiment with the SAWS modelling. Unfortunately, parts
> of their site seem to be down (http://www.ancientwisdoms.ac.uk/media/,
> there is a demo, but links to the texts won't work), and in particular,
> I haven't found XML data.
> Something else that I found problematic was their use of relation. If I
> read the RelaxNG schema correctly, relation is possible only for names
> (listNym), organizations (listOrg), events (listEvent), persons
> (listPerson), places (listPlace), or Named Entities in general
> (listRelation). The only possible way I see to express the linking of
> text segments using relation is to introduce events in a somewhat
> abusive way:
> <p>my line</p>
> <relation name="requiredName!" ref="saws:someRelation..."
> Three issues here:
> - sp is "an individual speech in a performance text" (not really
> applicable to written text, but the only way I see to create an event
> under a div element)
> - relation requires a @name (that the example under
> - @active "identifies the active participants in a non-mutual
> relationship, or all the participants in a mutual one", but is used in
> SAWS to indicate the referred segment
> Maybe I missed something, though.
> On Thu, 07 Nov 2013 14:41:21 +0100, Christian Chiarcos
> <[log in to unmask]> wrote:
>> Dear Gabriel,
>> thank you for pointing that out. I think their "isRelatedTo" and its
>> subproperties (especially isCloseRenderingOf and isLooseRenderingOf)
>> would be well-suited. I'm still hesitating, though, because if I
>> understand it correctly, it requires one additional element with at
>> least two attributes for every verse I'd like to address, adding 3*r
>> nodes to the document (with r being the number of cross-references).
>> <relation ref="saws:isCloseTranslationOf" active="#div1.i001"/>
>> As soon as multiple types of relations are to be distinguished (not
>> yet, I cannot tell them apart automatically), this seems to be the
>> solution. Until then, something more compact would be preferrable.
>> On Thu, 07 Nov 2013 12:01:51 +0100, Gabriel Bodard
>> <[log in to unmask]> wrote:
>>> Dear Christian,
>>> I can't speak in great detail to the TEI markup and data model you
>>> suggest below, but it occurs to me that there might be parallels
>>> between and value in exploring compatibility with the markup devised
>>> for similar purposes by the Sharing Ancient Wisdoms (SAWS) project:
>>> see http://www.ancientwisdoms.ac.uk/
>>> One of the aims of this project is to encode multiple mediaeval and
>>> ancient texts, some of which are collections of fragments of earlier
>>> texts, align them to various translations (close or loose) and to
>>> other texts of which they various segments might be copies,
>>> paraphrases, translations, or merely influenced by.
>>> To this end they used (1) CTS URNs (as URIs) for all texts and
>>> segments of texts, enabling pointing in both directions with minimal
>>> overhead in terms of intervention and insertion of ids in the text;
>>> (2) an ontology of text object and relationship types, described at
>>> (probably overkill for your purposes, but a minimal subset of it
>>> would be easy to devise); (3) a series of `tei:relation` elements to
>>> define the relationships between texts, places, persons, and other
>>> objects in the corpus.
>>> I'm not involved in either project, but as a glance it seems to me
>>> that a model along these lines might well work for the issues you are
>>> describing too. If you're interested in more information, I believe
>>> one or two of the SAWS developers are on this list (and they can
>>> probably correct some of my comments above, too).
>>> On 2013-11-07 10:02, Christian Chiarcos wrote:
>>>> Dear list members,
>>>> I am currently working on a massive corpus of verse-aligned religious
>>>> texts (Bibles, mostly, but also Qur'an editions) for linguistic and NLP
>>>> purposes. In the beginning, I've been adapting the CES specifications
>>>> Philipp Resnik developed decades ago for a similar, small-scale project
>>>> (in XML, not his SGML, of course). As we have outgrown the scale of his
>>>> project by lengths, it is about time to update our format to a more
>>>> recent standard, and TEI might be the format of choice.
>>>> Yet, there are certain aspects specific to a parallel corpus of bibles,
>>>> and I was wondering how to represent them with TEI:
>>>> - All bibles share the same set of verse identifiers, but occasionally,
>>>> a set of verses is not translated literally, but loosely translated
>>>> within a larger segment. We introduced an additional attribute altid
>>>> (alternate id), a sequence of NMTOKENS, each of which represents a
>>>> regular bible ID (we did not chose IDREFS because they are not defined
>>>> within the document). What would be the most efficient way to represent
>>>> this properly?
>>>> e.g. a multi-verse segment from a Low German (Westphalian) bible (in
>>>> <seg altid="b.MAT.17.22 b.MAT.17.23">
>>>> Os soe sik in Galiläa uphoelen, sia Jesus: Doe Minskensuone
>>>> sall baule den Hännen fan den Minsken iutliewert weren. Soe
>>>> weret en dautmaken, owwer am drüdden Dage sall hoe wir upston.
>>>> Do woören soe olle bedroöwet.
>>>> vs. a verse segment in another Low German (Plautdietsch) bible
>>>> <seg id="b.MAT.17.22" type="verse">
>>>> Aus see enn Galilaea eromm jinje, saed Jesus to an: "De
>>>> Menschesaen woat boolt enn Mensche aeare Henj jejaeft woare,
>>>> <seg id="b.MAT.17.23" type="verse">
>>>> en dee woare am doot moake, oba aum drede Dach woat hee fomm
>>>> Doot oppstone." En siene Jinja weare seeha truarich do aewa.
>>>> We query with XQuery across all bibles for a verse ID to compare
>>>> differences across languages and language stages. The altids are
>>>> inspected if a seg with the corresponding ID isn't found.
>>>> - Not only seg, but also div elements may carry the altid attribute,
>>>> e.g., for non-literal poetic bible adaptations where we have
>>>> chapter- or
>>>> book-level alignment only, but where smaller structures (e.g., l)
>>>> - altid also comes in handy if we want to mark cross-references to
>>>> bible passages that contain literal repetitions, e.g. (from the 1611
>>>> King James Version):
>>>> <seg id="b.EXO.20.12" altid="b.DEU.5.16" type="verse">
>>>> Honour thy father and thy mother: that thy dayes may bee long
>>>> vpon the land, which the Lord thy God giueth thee.
>>>> <seg id="b.DEU.5.16" altid="b.EXO.20.12" type="verse">
>>>> Honour thy father and thy mother, as the Lord thy God hath
>>>> commanded thee, that thy daies may be prolonged, and that it
>>>> may goe well with thee, in the land which the Lord thy God
>>>> giueth thee.
>>>> With our querying strategy, these altids will be relevant if we want to
>>>> retrieve matches from a Bible where the exact verse is lost, but a
>>>> near-analogon is found, nevertheless. This specific verse is, for
>>>> example, also quoted several times in the New Testament, and for
>>>> languages with an NT only, we would like to have these matches if we
>>>> query for b.EXO.20.12 or b.DEU.5.16.
>>>> In TEI, the id would correspond to an xml:id, but what would be a good
>>>> strategy to preserve the altid information without creating a large
>>>> overhead (as using the index element would entail) ?
>>>> Thanks a lot,
>>>> Christian Chiarcos
Dr Gabriel BODARD
Researcher in Digital Epigraphy
King's College London
Boris Karloff Building
26-29 Drury Lane
London WC2B 5RL
T: +44 (0)20 7848 1388
E: [log in to unmask]