Print

Print


> I have a question about transforming competing hierarchies.
>
> I have a runic metrical text found on a stone monument which amongst
> other interesting things is broken up into very small physical lines of
> often only two or three characters. In addition, the text is laid out in
> three discrete physical sections that ignore metrical or grammatical
> boundaries (there is a short broad section across the top, a long thin
> portion top-to-bottom down the right hand side, and then a long thin
> portion top to bottom down the left hand side).
>
> My transcription is metrico-diplomatic in the way only TEI allows: I'm
> using tei:l as my basic chunk level element, using milestones to record
> the beginning of each physical section, and using tei:lb to indicate
> location of line breaks. I've also arranged the text so that it makes
> grammatical sense: top, then down the right column, then back up and
> down the left column:
>
> <l n="1">
>   <milestone n="W1" unit="location" xml:id="west.top"/>
>   Fee fi fo <lb/><milestone n="WS1" unit="location"
> xml:id="west.south.1"/>fum <lb/>
> <l>
> <l n="2">
>   I <lb/>sm<lb/>e<lb/>ll <lb/>th<lb/>e b<lb/>loo<lb/>d
> </l>
> <l n="3">
>   Of <lb/>a<lb/>n E<lb/>ngl<lb/>ish<lb/>m<lb/>an <lb/>
> </l>
> <l>... <lb/><milestone n="WN1" unit="location"
> xml:id="west.north.1"/>...
> etc.
>
> Now I want to produce two views of this: metrical without the diplomatic
> information, and diplomatic without the metrical information. I.e. I'd
> like to produce output as if I had two TEI texts:
>
> <l n="1">
>   Fee fi fo fum
> <l>
> <l n="2">
>   I smell the blood
> </l>
> <l n="3">
>  Of an Englishman
> </l>
> etc.
>
> And
>
> <div type="textblock" xml:id="west.top" n="West Top">
>   <ab>Fee fi fo</ab>
> </div>
> <div type="textblock" xml:id="west.top" n="West South Column">
>   <ab>fum</ab>
>   <ab>I</ab>
>   <ab>sm</ab>
>   <ab>e</ab>
>   <ab>ll</ab>
>   <ab>th</ab>
>   <ab>e b</ab>
>   <ab>loo</ab>
>   <ab>d</ab>
>   <ab>Of</ab>
>   <ab>a</ab>
>   <ab>n E</ab>
>   <ab>ngl</ab>
>   <ab>ish</ab>
>   <ab>m</ab>
>   <ab>an</ab>
> ...
> </div>
> <div type="textblock" xml:id="west.top" n="West North Column">
> ...
> </div>
>
> The divs are the hard bit, since tei:lb can always be transformed to the
> equivalent html:br. But if I want to reproduce the physical layout of
> the sections (as opposed to their metrical order), I'm going to have to
> identify the top, south, and north columns as separate divisions, as far
> as I can tell.
>
> Is there a way of doing this without escape-disable in XSL (or whatever
> it is called)? Is there a different way of encoding the original text to
> preserve information about the multiple hierarchies?

I have something that may be useful for dealing with complexe 
milestones like this. On http://panini.u-paris10.fr/~sloiseau/CR you 
may download an application 
(http://panini.u-paris10.fr/~sloiseau/CR/download/CR.zip, the 
(incomplete) doc is in French) I'm using for such transformations of 
complex intersecting hierarchies. The basic idea is to process this at 
the stream level (with the SAX API), since a streaming API give access 
to a kind of "precedence first" walk of the tree, while tree APIs or 
tree-based languages are not easy for expressing milestones through 
dominance. Here is what it does:

Assuming your corpus is (I've closed the first "l" and copy-past some data):

8<-----8<-----8<-----8<-----8<-----8<-----

<TEI.2>
  <text>
    <body>
      <l n="1">
	<milestone n="W1" unit="location" xml:id="west.top"/>
	Fee fi fo <lb/><milestone n="WS1" unit="location"
	xml:id="west.south.1"/>fum <lb/>
      </l>
      <l n="2">
	I <lb/>sm<lb/>e<lb/>ll <lb/>th<lb/>e b<lb/>loo<lb/>d
      </l>
      <l n="3">
	Of <lb/>a<lb/>n E<lb/>ngl<lb/>ish<lb/>m<lb/>an <lb/>
      </l>
      <l>... <lb/><milestone n="WN1" unit="location"
      xml:id="west.north.1"/></l>
      <l n="3">
	Of <lb/>a<lb/>n E<lb/>ngl<lb/>ish<lb/>m<lb/>an <lb/>
      </l>
    </body>
  </text>
</TEI.2>

8<-----8<-----8<-----8<-----8<-----8<-----

Here is the output (it may be better: some 'ab' are empty, the 
"milestone" is not immediatly following the "div" created, but it may 
be easier to process with XSLT now since the segmentation is a tree) :

8<-----8<-----8<-----8<-----8<-----8<-----

<?xml version="1.0" standalone="yes"?>

<TEI.2>
  <text>
    <body><div><ab><milestone n="W1" unit="location" 
xml:id="west.top"></milestone>
	Fee fi fo </ab><ab></ab></div><div><ab><milestone n="WS1" 
unit="location" xml:id="west.south.1"></milestone>fum </ab><ab>
           	I </ab><ab>sm</ab><ab>e</ab><ab>ll </ab><ab>th</ab><ab>e 
b</ab><ab>loo</ab><ab>d
           	Of </ab><ab>a</ab><ab>n 
E</ab><ab>ngl</ab><ab>ish</ab><ab>m</ab><ab>an </ab><ab>
           ... </ab><ab></ab></div><div><ab><milestone n="WN1" 
unit="location" xml:id="west.north.1"></milestone>
      	Of </ab><ab>a</ab><ab>n 
E</ab><ab>ngl</ab><ab>ish</ab><ab>m</ab><ab>an </ab><ab>
         </ab></div></body>
  </text>
</TEI.2>

8<-----8<-----8<-----8<-----8<-----8<-----

Here is the parameter used for launching the program (with comments):

(assuming this document is in a file "query.cr", you may lauch the 
program with "java -jar CR.jar query.cr")

8<-----8<-----8<-----8<-----8<-----8<-----

<?xml version="1.0" encoding="iso-8859-1"?>
<query>

  <corpus inURI="tei.test.xml" 	  outURI="tei.test.out.xml"/>

  <filterList>
    <!-- for, optionnaly, removing the "l" elements -->
    <filter javaClass="tei.cr.filters.RemoveElement">
      <args>
	<element elxpath="l" />
      </args>
    </filter>

    <!-- transform <milestone> milestone into <div> element -->
    <split localName="body">
      <filterList>
	<filter javaClass="tei.cr.filters.ExtractMilestone">
	  <args>
	    <startBoundary elxpath="milestone" />
	    <treesRootElement localName="div" />
	  </args>
	</filter>
      </filterList>
    </split>

    <!-- transform <lb> milestone into <ab> element -->
    <split localName="div">
      <filterList>
	<filter javaClass="tei.cr.filters.ExtractMilestone">
	  <args>
	    <startBoundary elxpath="lb" />
	    <treesRootElement localName="ab" />
	  </args>
	</filter>
      </filterList>
    </split>

    <!-- for, optionnaly, removing the "lb" elements -->
    <filter javaClass="tei.cr.filters.RemoveElement">
      <args>
	<element elxpath="lb" />
      </args>
    </filter>

  </filterList>
</query>

8<-----8<-----8<-----8<-----8<-----8<-----

Hope it may help,
Sylvain


-- 
Sylvain Loiseau
[log in to unmask]
http://panini.u-paris10.fr/~sloiseau

« Notre société produit des schizos comme du shampoing Dop ou des autos 
Renault, à la seule différence qu'ils ne sont pas vendables. » Deleuze, 
l'Anti-OEdipe

----------------------------------------------------------------
Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre