Print

Print


There are a few translation environments with translation memory using proprietary format for these memories (list of aligned segments of text in two languages). An open XML format (TMX) allows exchange of memories between these softwares and an open software (OmegaT) uses TMX as base format. This makes TMX a good candidate as the paradigm of a format for two-languages texts.
I am an amateur at translation, at TMX and OmegaT (as for TEI), but I published some considerations on TMX (in French: http://www.d-meeus.be/linux/traduire.html) and even an export filter for LibreOffice, to save as TMX a two-columns spreadsheet with aligned segments.
I feel that TEI should try to follow closely TMX in solutions to encode two language versions. This would make XSLT transforms easier between the two.
Typical to TMX is that the correspondence between two segments (<tuv>) results from them being child of the same element (translation unit <tu>). Here a short file reduced to one segment:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE tmx SYSTEM "tmx11.dtd">
<tmx version="1.1">
  <header creationtool="OmegaT" o-tmf="OmegaT TMX" adminlang="EN-US" datatype="plaintext" creationtoolversion="3.0.6" segtype="sentence" srclang="NL-BE"/>
  <body>
    <tu>
      <tuv lang="NL-BE">
        <seg>Alvast bedankt voor uw hulp,</seg>
      </tuv>
      <tuv lang="FR-BE" changeid="mic" changedate="20140225T211611Z" creationid="mic" creationdate="20140225T211611Z">
        <seg>Merci d’avance de votre aide.</seg>
      </tuv>
    </tu>
  </body>
</tmx>
There is no @xml:is in the first <tuv>. Its priority results from it being in the source language declared as @srclang in the <header>. In TEI, one could think of combining the @corresp solution with the @xml:lang)
<div>
  <ab xml:id="so-en-so" xml:lang="old">…</ab>
  <ab corres="#so-en-so" xml:lang="new">…</ab>
</div>
with one div corresponding to the <tu> container for two languages, not separate div’s.
Of course the aim of TMX is to reuse segments, not to store text. OmegaT for example sorts segments alphabetically (following the source language). In TEI the order of the elements reflects the oriented flow of the text. One could export from TEI to TMX. The converse would hold only with ‘safe,’ linear, TMX.