Hi Christian,

Thanks for the careful review, I have always found these intra-biblical 
relationships fascinating (though their impact on people much less so).

You might have reacted to my use of modality ("should not"), but that 
was a purely technical remark concerning markup mechanisms: I was trying 
to say that where one mechanism (say, multi-valued attribute) has a 
possibly ambiguous interpretation, another near-equivalent mechanism 
(say, elements that you can structure to a greater extent than you could 
attributes) may very well eliminate such ambiguity (and then, if it 
_may_, then the annotator _should_ be forced to discriminate).

Now, from what you say below, the ambiguity that your system currently 
enjoys may be (nomen omen) a blessing. That, in turn, can be taken as an 
argument for the minimalistic solution that I have suggested: don't go 
for a different markup mechanism, just use a different (non-TEI) 
namespace for the @altid.

Good luck,


On 11/08/2013 11:02 AM, Christian Chiarcos wrote:
> Hi Piotr,
> as for the ambiguity, there are different types of relations involved
> between verses, indeed, but it is not possible to establish clear
> distinctions without a full-blown hermeneutic analysis of the
> respective part of the bible. The books were originally independent
> creations with complicated dependencies among them. The gospels are a
> perfect example, where it is debatable (a) how they influenced each
> other (was Mark directly inspired by Matthew and may verse
> correspondences be literal quotes?), (b) to what extent they drew from
> the same sources (did the gospels quote directly from the [lost]
> gospel of Thomas ?), and (c) what languages were involved ? (What was
> directly written in Latin or Greek and is thus a direct quote, what
> was independently translated from, say, Aramaic.) So the relations
> between different books in the bible are basically the same that might
> occur between works that we consider to be independent. Hence, using
> both id and altid for the same verse within the Bible would be
> justified.
> The problem multiplies if bible-derived texts are taken into
> consideration. The Diatessaron is a gospel harmony that we have in
> (e.g.) Old High German  and Latin. Both require their own verse ids
> because they are aligned with each other in the original document. But
> both are also aligned with the Bible. This is quite obvious for the
> Latin part which shows great resemblance with the Latin vulgata and a
> verse(group) alignment is easily achieved. So, we need an id (for the
> Diatessaron verse) and an altid (for the Bible verse).
> It is, however, close to impossible to determine the nature of the
> relation between a (Latin) Tatian verse and a (Latin) Bible verse. It
> may have been a direct quote (from the Vulgate that later copists
> might assume to be authoritative), but it may also have been an
> independent translation from the original (Vulgate from Greek,
> Diatessaron from Syriac?), or there may have been innovation during
> the translation process (in the Diatessaron for example where
> grammatical ambiguities introduced during the translation process over
> multiple languages were resolved differently that represented in the
> Vulgate). So, correspondence between verses may be anything from loose
> paraphrase over literal translation to direct quote. We cannot
> (automatically) decide for any possibility and we cannot even be sure
> that relations between verses from the same language are not actually
> mediated by other languages.
> Best,
> Christian
> 2013/11/7 Piotr Bański <[log in to unmask]>:
>> Hi Christian,
>> My first remark is that, while you take care to preserve the semantics of
>> @altid inside an alternative mechanism, you've actually demonstrated two
>> kinds of @altid semantics below, which need not (perhaps sometimes should
>> not) preserve the ambiguity when getting translated into a new mechanism.
>> The second remark is: if the cost of the alternatives, in terms of
>> construction "heaviness" is an issue (and I can see your reasons), why not
>> get minimalistic about it and go for just @project:altid, i.e. define @altid
>> within a non-TEI namespace? This is by all means a kosher approach ;-)
>> Best,
>>    Piotr
>> On 11/07/2013 11:02 AM, Christian Chiarcos wrote:
>>> Dear list members,
>>> I am currently working on a massive corpus of verse-aligned religious
>>> texts (Bibles, mostly, but also Qur'an editions) for linguistic and NLP
>>> purposes. In the beginning, I've been adapting the CES specifications
>>> Philipp Resnik developed decades ago for a similar, small-scale project
>>> (in XML, not his SGML, of course). As we have outgrown the scale of his
>>> project by lengths, it is about time to update our format to a more
>>> recent standard, and TEI might be the format of choice.
>>> Yet, there are certain aspects specific to a parallel corpus of bibles,
>>> and I was wondering how to represent them with TEI:
>>> - All bibles share the same set of verse identifiers, but occasionally,
>>> a set of verses is not translated literally, but loosely translated
>>> within a larger segment. We introduced an additional attribute altid
>>> (alternate id), a sequence of NMTOKENS, each of which represents a
>>> regular bible ID (we did not chose IDREFS because they are not defined
>>> within the document). What would be the most efficient way to represent
>>> this properly?
>>> e.g. a multi-verse segment from a Low German (Westphalian) bible (in our
>>> CES-adaptation):
>>> <seg altid="b.MAT.17.22 b.MAT.17.23">
>>>       Os soe sik in Galiläa uphoelen, sia Jesus: Doe Minskensuone
>>>       sall baule den Hännen fan den Minsken iutliewert weren. Soe
>>>       weret en dautmaken, owwer am drüdden Dage sall hoe wir upston.
>>>       Do woören soe olle bedroöwet.
>>> </seg>
>>> vs. a verse segment in another Low German (Plautdietsch) bible
>>> <seg id="b.MAT.17.22" type="verse">
>>>       Aus see enn Galilaea eromm jinje, saed Jesus to an: "De
>>>       Menschesaen woat boolt enn Mensche aeare Henj jejaeft woare,
>>> </seg>
>>> <seg id="b.MAT.17.23" type="verse">
>>>       en dee woare am doot moake, oba aum drede Dach woat hee fomm
>>>       Doot oppstone." En siene Jinja weare seeha truarich do aewa.
>>> </seg>
>>> We query with XQuery across all bibles for a verse ID to compare
>>> differences across languages and language stages. The altids are
>>> inspected if a seg with the corresponding ID isn't found.
>>> - Not only seg, but also div elements may carry the altid attribute,
>>> e.g., for non-literal poetic bible adaptations where we have chapter- or
>>> book-level alignment only, but where smaller structures (e.g., l) exist.
>>> - altid also comes in handy if we want to mark cross-references to other
>>> bible passages that contain literal repetitions, e.g. (from the 1611
>>> King James Version):
>>> <seg id="b.EXO.20.12" altid="b.DEU.5.16" type="verse">
>>>       Honour thy father and thy mother: that thy dayes may bee long
>>>       vpon the land, which the Lord thy God giueth thee.
>>> </seg>
>>> <seg id="b.DEU.5.16" altid="b.EXO.20.12" type="verse">
>>>       Honour thy father and thy mother, as the Lord thy God hath
>>>       commanded thee, that thy daies may be prolonged, and that it
>>>       may goe well with thee, in the land which the Lord thy God
>>>       giueth thee.
>>> </seg>
>>> With our querying strategy, these altids will be relevant if we want to
>>> retrieve matches from a Bible where the exact verse is lost, but a
>>> near-analogon is found, nevertheless. This specific verse is, for
>>> example, also quoted several times in the New Testament, and for
>>> languages with an NT only, we would like to have these matches if we
>>> query for b.EXO.20.12 or b.DEU.5.16.
>>> In TEI, the id would correspond to an xml:id, but what would be a good
>>> strategy to preserve the altid information without creating a large
>>> overhead (as using the index element would entail) ?
>>> Thanks a lot,
>>> Christian Chiarcos