My first remark is that, while you take care to preserve the semantics
of @altid inside an alternative mechanism, you've actually demonstrated
two kinds of @altid semantics below, which need not (perhaps sometimes
should not) preserve the ambiguity when getting translated into a new
The second remark is: if the cost of the alternatives, in terms of
construction "heaviness" is an issue (and I can see your reasons), why
not get minimalistic about it and go for just @project:altid, i.e.
define @altid within a non-TEI namespace? This is by all means a kosher
On 11/07/2013 11:02 AM, Christian Chiarcos wrote:
> Dear list members,
> I am currently working on a massive corpus of verse-aligned religious
> texts (Bibles, mostly, but also Qur'an editions) for linguistic and NLP
> purposes. In the beginning, I've been adapting the CES specifications
> Philipp Resnik developed decades ago for a similar, small-scale project
> (in XML, not his SGML, of course). As we have outgrown the scale of his
> project by lengths, it is about time to update our format to a more
> recent standard, and TEI might be the format of choice.
> Yet, there are certain aspects specific to a parallel corpus of bibles,
> and I was wondering how to represent them with TEI:
> - All bibles share the same set of verse identifiers, but occasionally,
> a set of verses is not translated literally, but loosely translated
> within a larger segment. We introduced an additional attribute altid
> (alternate id), a sequence of NMTOKENS, each of which represents a
> regular bible ID (we did not chose IDREFS because they are not defined
> within the document). What would be the most efficient way to represent
> this properly?
> e.g. a multi-verse segment from a Low German (Westphalian) bible (in our
> <seg altid="b.MAT.17.22 b.MAT.17.23">
> Os soe sik in Galiläa uphoelen, sia Jesus: Doe Minskensuone
> sall baule den Hännen fan den Minsken iutliewert weren. Soe
> weret en dautmaken, owwer am drüdden Dage sall hoe wir upston.
> Do woören soe olle bedroöwet.
> vs. a verse segment in another Low German (Plautdietsch) bible
> <seg id="b.MAT.17.22" type="verse">
> Aus see enn Galilaea eromm jinje, saed Jesus to an: "De
> Menschesaen woat boolt enn Mensche aeare Henj jejaeft woare,
> <seg id="b.MAT.17.23" type="verse">
> en dee woare am doot moake, oba aum drede Dach woat hee fomm
> Doot oppstone." En siene Jinja weare seeha truarich do aewa.
> We query with XQuery across all bibles for a verse ID to compare
> differences across languages and language stages. The altids are
> inspected if a seg with the corresponding ID isn't found.
> - Not only seg, but also div elements may carry the altid attribute,
> e.g., for non-literal poetic bible adaptations where we have chapter- or
> book-level alignment only, but where smaller structures (e.g., l) exist.
> - altid also comes in handy if we want to mark cross-references to other
> bible passages that contain literal repetitions, e.g. (from the 1611
> King James Version):
> <seg id="b.EXO.20.12" altid="b.DEU.5.16" type="verse">
> Honour thy father and thy mother: that thy dayes may bee long
> vpon the land, which the Lord thy God giueth thee.
> <seg id="b.DEU.5.16" altid="b.EXO.20.12" type="verse">
> Honour thy father and thy mother, as the Lord thy God hath
> commanded thee, that thy daies may be prolonged, and that it
> may goe well with thee, in the land which the Lord thy God
> giueth thee.
> With our querying strategy, these altids will be relevant if we want to
> retrieve matches from a Bible where the exact verse is lost, but a
> near-analogon is found, nevertheless. This specific verse is, for
> example, also quoted several times in the New Testament, and for
> languages with an NT only, we would like to have these matches if we
> query for b.EXO.20.12 or b.DEU.5.16.
> In TEI, the id would correspond to an xml:id, but what would be a good
> strategy to preserve the altid information without creating a large
> overhead (as using the index element would entail) ?
> Thanks a lot,
> Christian Chiarcos