as for the ambiguity, there are different types of relations involved
between verses, indeed, but it is not possible to establish clear
distinctions without a full-blown hermeneutic analysis of the
respective part of the bible. The books were originally independent
creations with complicated dependencies among them. The gospels are a
perfect example, where it is debatable (a) how they influenced each
other (was Mark directly inspired by Matthew and may verse
correspondences be literal quotes?), (b) to what extent they drew from
the same sources (did the gospels quote directly from the [lost]
gospel of Thomas ?), and (c) what languages were involved ? (What was
directly written in Latin or Greek and is thus a direct quote, what
was independently translated from, say, Aramaic.) So the relations
between different books in the bible are basically the same that might
occur between works that we consider to be independent. Hence, using
both id and altid for the same verse within the Bible would be
The problem multiplies if bible-derived texts are taken into
consideration. The Diatessaron is a gospel harmony that we have in
(e.g.) Old High German and Latin. Both require their own verse ids
because they are aligned with each other in the original document. But
both are also aligned with the Bible. This is quite obvious for the
Latin part which shows great resemblance with the Latin vulgata and a
verse(group) alignment is easily achieved. So, we need an id (for the
Diatessaron verse) and an altid (for the Bible verse).
It is, however, close to impossible to determine the nature of the
relation between a (Latin) Tatian verse and a (Latin) Bible verse. It
may have been a direct quote (from the Vulgate that later copists
might assume to be authoritative), but it may also have been an
independent translation from the original (Vulgate from Greek,
Diatessaron from Syriac?), or there may have been innovation during
the translation process (in the Diatessaron for example where
grammatical ambiguities introduced during the translation process over
multiple languages were resolved differently that represented in the
Vulgate). So, correspondence between verses may be anything from loose
paraphrase over literal translation to direct quote. We cannot
(automatically) decide for any possibility and we cannot even be sure
that relations between verses from the same language are not actually
mediated by other languages.
2013/11/7 Piotr Bański <[log in to unmask]>:
> Hi Christian,
> My first remark is that, while you take care to preserve the semantics of
> @altid inside an alternative mechanism, you've actually demonstrated two
> kinds of @altid semantics below, which need not (perhaps sometimes should
> not) preserve the ambiguity when getting translated into a new mechanism.
> The second remark is: if the cost of the alternatives, in terms of
> construction "heaviness" is an issue (and I can see your reasons), why not
> get minimalistic about it and go for just @project:altid, i.e. define @altid
> within a non-TEI namespace? This is by all means a kosher approach ;-)
> On 11/07/2013 11:02 AM, Christian Chiarcos wrote:
>> Dear list members,
>> I am currently working on a massive corpus of verse-aligned religious
>> texts (Bibles, mostly, but also Qur'an editions) for linguistic and NLP
>> purposes. In the beginning, I've been adapting the CES specifications
>> Philipp Resnik developed decades ago for a similar, small-scale project
>> (in XML, not his SGML, of course). As we have outgrown the scale of his
>> project by lengths, it is about time to update our format to a more
>> recent standard, and TEI might be the format of choice.
>> Yet, there are certain aspects specific to a parallel corpus of bibles,
>> and I was wondering how to represent them with TEI:
>> - All bibles share the same set of verse identifiers, but occasionally,
>> a set of verses is not translated literally, but loosely translated
>> within a larger segment. We introduced an additional attribute altid
>> (alternate id), a sequence of NMTOKENS, each of which represents a
>> regular bible ID (we did not chose IDREFS because they are not defined
>> within the document). What would be the most efficient way to represent
>> this properly?
>> e.g. a multi-verse segment from a Low German (Westphalian) bible (in our
>> <seg altid="b.MAT.17.22 b.MAT.17.23">
>> Os soe sik in Galiläa uphoelen, sia Jesus: Doe Minskensuone
>> sall baule den Hännen fan den Minsken iutliewert weren. Soe
>> weret en dautmaken, owwer am drüdden Dage sall hoe wir upston.
>> Do woören soe olle bedroöwet.
>> vs. a verse segment in another Low German (Plautdietsch) bible
>> <seg id="b.MAT.17.22" type="verse">
>> Aus see enn Galilaea eromm jinje, saed Jesus to an: "De
>> Menschesaen woat boolt enn Mensche aeare Henj jejaeft woare,
>> <seg id="b.MAT.17.23" type="verse">
>> en dee woare am doot moake, oba aum drede Dach woat hee fomm
>> Doot oppstone." En siene Jinja weare seeha truarich do aewa.
>> We query with XQuery across all bibles for a verse ID to compare
>> differences across languages and language stages. The altids are
>> inspected if a seg with the corresponding ID isn't found.
>> - Not only seg, but also div elements may carry the altid attribute,
>> e.g., for non-literal poetic bible adaptations where we have chapter- or
>> book-level alignment only, but where smaller structures (e.g., l) exist.
>> - altid also comes in handy if we want to mark cross-references to other
>> bible passages that contain literal repetitions, e.g. (from the 1611
>> King James Version):
>> <seg id="b.EXO.20.12" altid="b.DEU.5.16" type="verse">
>> Honour thy father and thy mother: that thy dayes may bee long
>> vpon the land, which the Lord thy God giueth thee.
>> <seg id="b.DEU.5.16" altid="b.EXO.20.12" type="verse">
>> Honour thy father and thy mother, as the Lord thy God hath
>> commanded thee, that thy daies may be prolonged, and that it
>> may goe well with thee, in the land which the Lord thy God
>> giueth thee.
>> With our querying strategy, these altids will be relevant if we want to
>> retrieve matches from a Bible where the exact verse is lost, but a
>> near-analogon is found, nevertheless. This specific verse is, for
>> example, also quoted several times in the New Testament, and for
>> languages with an NT only, we would like to have these matches if we
>> query for b.EXO.20.12 or b.DEU.5.16.
>> In TEI, the id would correspond to an xml:id, but what would be a good
>> strategy to preserve the altid information without creating a large
>> overhead (as using the index element would entail) ?
>> Thanks a lot,
>> Christian Chiarcos