Reading over this again, I was wondering whether such a linking with an
ontology should not better be represented using something like RDFa. (I'm
not arguing to replace one with the other, as they have very different
goals, but complementing TEI with an explicit link towards LOD while
maintaining existing TEI infrastructure and specifications might yield
I know about that there are a few papers discussing this possibility, and
the SAWS proposal goes into this direction, too, but what would be the
general attitude of the community towards this interface. And if it is
positive (partially, at least), is there any activity in this direction ?
In RDFa, the earlier SAWS statement could be rendered in a more compact
way, but this would require the introduction of new attributes like
@resource, @property and @typeOf:
<seg resource="#myId" typeOf="saws:Section">
<relation property="saws:someRelation" resource="#targetId"/>
... regular seg content ...
I guess, link could be more naturally extended in this way using existing
<link type="saws:someRelation" target="#targetId"/>
However, an RDFa processor would need to be informed how @link and @target
are to be evaluated to make sense of it.
On Fri, 08 Nov 2013 11:50:45 +0100, Christian Chiarcos
<[log in to unmask]> wrote:
> Having found a preliminary representation of cross references between
> segments, I would like to come back to Gabriels suggestion with greater
> level of detail, and there are a few questions/observations.
> Based on our current representation (resp., a close rendeing in TEI), I
> would like to create an interface that allows researchers to refine both
> the scope of correspondences and their type. At this point, it would be
> interesting to experiment with the SAWS modelling. Unfortunately, parts
> of their site seem to be down (http://www.ancientwisdoms.ac.uk/media/,
> there is a demo, but links to the texts won't work), and in particular,
> I haven't found XML data.
> Something else that I found problematic was their use of relation. If I
> read the RelaxNG schema correctly, relation is possible only for names
> (listNym), organizations (listOrg), events (listEvent), persons
> (listPerson), places (listPlace), or Named Entities in general
> (listRelation). The only possible way I see to express the linking of
> text segments using relation is to introduce events in a somewhat
> abusive way:
> <p>my line</p>
> <relation name="requiredName!" ref="saws:someRelation..."
> Three issues here:
> - sp is "an individual speech in a performance text" (not really
> applicable to written text, but the only way I see to create an event
> under a div element)
> - relation requires a @name (that the example under
> - @active "identifies the active participants in a non-mutual
> relationship, or all the participants in a mutual one", but is used in
> SAWS to indicate the referred segment
> Maybe I missed something, though.
> On Thu, 07 Nov 2013 14:41:21 +0100, Christian Chiarcos
> <[log in to unmask]> wrote:
>> Dear Gabriel,
>> thank you for pointing that out. I think their "isRelatedTo" and its
>> subproperties (especially isCloseRenderingOf and isLooseRenderingOf)
>> would be well-suited. I'm still hesitating, though, because if I
>> understand it correctly, it requires one additional element with at
>> least two attributes for every verse I'd like to address, adding 3*r
>> nodes to the document (with r being the number of cross-references).
>> <relation ref="saws:isCloseTranslationOf" active="#div1.i001"/>
>> As soon as multiple types of relations are to be distinguished (not
>> yet, I cannot tell them apart automatically), this seems to be the
>> solution. Until then, something more compact would be preferrable.
>> On Thu, 07 Nov 2013 12:01:51 +0100, Gabriel Bodard
>> <[log in to unmask]> wrote:
>>> Dear Christian,
>>> I can't speak in great detail to the TEI markup and data model you
>>> suggest below, but it occurs to me that there might be parallels
>>> between and value in exploring compatibility with the markup devised
>>> for similar purposes by the Sharing Ancient Wisdoms (SAWS) project:
>>> see http://www.ancientwisdoms.ac.uk/
>>> One of the aims of this project is to encode multiple mediaeval and
>>> ancient texts, some of which are collections of fragments of earlier
>>> texts, align them to various translations (close or loose) and to
>>> other texts of which they various segments might be copies,
>>> paraphrases, translations, or merely influenced by.
>>> To this end they used (1) CTS URNs (as URIs) for all texts and
>>> segments of texts, enabling pointing in both directions with minimal
>>> overhead in terms of intervention and insertion of ids in the text;
>>> (2) an ontology of text object and relationship types, described at
>>> (probably overkill for your purposes, but a minimal subset of it would
>>> be easy to devise); (3) a series of `tei:relation` elements to define
>>> the relationships between texts, places, persons, and other objects in
>>> the corpus.
>>> I'm not involved in either project, but as a glance it seems to me
>>> that a model along these lines might well work for the issues you are
>>> describing too. If you're interested in more information, I believe
>>> one or two of the SAWS developers are on this list (and they can
>>> probably correct some of my comments above, too).
>>> On 2013-11-07 10:02, Christian Chiarcos wrote:
>>>> Dear list members,
>>>> I am currently working on a massive corpus of verse-aligned religious
>>>> texts (Bibles, mostly, but also Qur'an editions) for linguistic and
>>>> purposes. In the beginning, I've been adapting the CES specifications
>>>> Philipp Resnik developed decades ago for a similar, small-scale
>>>> (in XML, not his SGML, of course). As we have outgrown the scale of
>>>> project by lengths, it is about time to update our format to a more
>>>> recent standard, and TEI might be the format of choice.
>>>> Yet, there are certain aspects specific to a parallel corpus of
>>>> and I was wondering how to represent them with TEI:
>>>> - All bibles share the same set of verse identifiers, but
>>>> a set of verses is not translated literally, but loosely translated
>>>> within a larger segment. We introduced an additional attribute altid
>>>> (alternate id), a sequence of NMTOKENS, each of which represents a
>>>> regular bible ID (we did not chose IDREFS because they are not defined
>>>> within the document). What would be the most efficient way to
>>>> this properly?
>>>> e.g. a multi-verse segment from a Low German (Westphalian) bible (in
>>>> <seg altid="b.MAT.17.22 b.MAT.17.23">
>>>> Os soe sik in Galiläa uphoelen, sia Jesus: Doe Minskensuone
>>>> sall baule den Hännen fan den Minsken iutliewert weren. Soe
>>>> weret en dautmaken, owwer am drüdden Dage sall hoe wir upston.
>>>> Do woören soe olle bedroöwet.
>>>> vs. a verse segment in another Low German (Plautdietsch) bible
>>>> <seg id="b.MAT.17.22" type="verse">
>>>> Aus see enn Galilaea eromm jinje, saed Jesus to an: "De
>>>> Menschesaen woat boolt enn Mensche aeare Henj jejaeft woare,
>>>> <seg id="b.MAT.17.23" type="verse">
>>>> en dee woare am doot moake, oba aum drede Dach woat hee fomm
>>>> Doot oppstone." En siene Jinja weare seeha truarich do aewa.
>>>> We query with XQuery across all bibles for a verse ID to compare
>>>> differences across languages and language stages. The altids are
>>>> inspected if a seg with the corresponding ID isn't found.
>>>> - Not only seg, but also div elements may carry the altid attribute,
>>>> e.g., for non-literal poetic bible adaptations where we have chapter-
>>>> book-level alignment only, but where smaller structures (e.g., l)
>>>> - altid also comes in handy if we want to mark cross-references to
>>>> bible passages that contain literal repetitions, e.g. (from the 1611
>>>> King James Version):
>>>> <seg id="b.EXO.20.12" altid="b.DEU.5.16" type="verse">
>>>> Honour thy father and thy mother: that thy dayes may bee long
>>>> vpon the land, which the Lord thy God giueth thee.
>>>> <seg id="b.DEU.5.16" altid="b.EXO.20.12" type="verse">
>>>> Honour thy father and thy mother, as the Lord thy God hath
>>>> commanded thee, that thy daies may be prolonged, and that it
>>>> may goe well with thee, in the land which the Lord thy God
>>>> giueth thee.
>>>> With our querying strategy, these altids will be relevant if we want
>>>> retrieve matches from a Bible where the exact verse is lost, but a
>>>> near-analogon is found, nevertheless. This specific verse is, for
>>>> example, also quoted several times in the New Testament, and for
>>>> languages with an NT only, we would like to have these matches if we
>>>> query for b.EXO.20.12 or b.DEU.5.16.
>>>> In TEI, the id would correspond to an xml:id, but what would be a good
>>>> strategy to preserve the altid information without creating a large
>>>> overhead (as using the index element would entail) ?
>>>> Thanks a lot,
>>>> Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: [log in to unmask]