Print

Print


Daniel,

Please see my Extreme paper of 2004, at

http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Piez01/EML2004Piez01.html

The quick answer: what you want to do is possible, using XSLT 2.0. 
Not all the details of how to make it work have been hammered out -- 
for me, this is an interesting research area but not a problem of 
great practical importance. (That would change if I had a dataset 
like yours!) Since that paper was written, I've refined some of the 
ideas in my head, but I haven't actually done much more about 
implementing any of them.

That having been said, it turns out the hard part isn't actually 
transforming from one hierarchy into the other. Basically the problem 
is solved by:
1. Flattening all hierarchies
2. Using grouping methods to induce the hierarchy you want from the 
flattened representation

(It's #2 that I intend to refine further if I can ever get time to 
prioritize this.)

The harder problems are:
3. Determining when a target hierarchy is spurious or broken (i.e. 
assuring well-formed output)
4. Determining where potential target hierarchies exist
5. Modelling possible relationships between hierarchies, etc.

Using XSLT 2.0, I believe even these problems are tractable (though 
#5 is rather open-ended).

Note, however, that this approach basically works by removing the 
hierarchy from the XML (step 1), thus treating XML like a "streaming" 
(SAX-like) data model. This is a great part of why I've done this 
work under the LMNL umbrella. LMNL is fundamentally a data model, 
useful here because it provides a terminology for dealing with some 
of the issues that come up, like overlap. Because I didn't solve 
problem #3 in the paper cited above, I used a LMNL syntax to express 
output. But if I knew going in that #3 wasn't going to be a problem, 
I could serialize the output as XML just as easily.

Cheers,
Wendell


At 10:12 AM 3/14/2006, you wrote:
>I have a question about transforming competing hierarchies.
>
>I have a runic metrical text found on a stone monument which amongst
>other interesting things is broken up into very small physical lines of
>often only two or three characters. In addition, the text is laid out in
>three discrete physical sections that ignore metrical or grammatical
>boundaries (there is a short broad section across the top, a long thin
>portion top-to-bottom down the right hand side, and then a long thin
>portion top to bottom down the left hand side).
>
>My transcription is metrico-diplomatic in the way only TEI allows: I'm
>using tei:l as my basic chunk level element, using milestones to record
>the beginning of each physical section, and using tei:lb to indicate
>location of line breaks. I've also arranged the text so that it makes
>grammatical sense: top, then down the right column, then back up and
>down the left column:
>
><l n="1">
>   <milestone n="W1" unit="location" xml:id="west.top"/>
>   Fee fi fo <lb/><milestone n="WS1" unit="location"
>xml:id="west.south.1"/>fum <lb/>
><l>
><l n="2">
>   I <lb/>sm<lb/>e<lb/>ll <lb/>th<lb/>e b<lb/>loo<lb/>d
></l>
><l n="3">
>   Of <lb/>a<lb/>n E<lb/>ngl<lb/>ish<lb/>m<lb/>an <lb/>
></l>
><l>... <lb/><milestone n="WN1" unit="location"
>xml:id="west.north.1"/>...
>etc.
>
>Now I want to produce two views of this: metrical without the diplomatic
>information, and diplomatic without the metrical information. I.e. I'd
>like to produce output as if I had two TEI texts:
>
><l n="1">
>   Fee fi fo fum
><l>
><l n="2">
>   I smell the blood
></l>
><l n="3">
>  Of an Englishman
></l>
>etc.
>
>And
>
><div type="textblock" xml:id="west.top" n="West Top">
>   <ab>Fee fi fo</ab>
></div>
><div type="textblock" xml:id="west.top" n="West South Column">
>   <ab>fum</ab>
>   <ab>I</ab>
>   <ab>sm</ab>
>   <ab>e</ab>
>   <ab>ll</ab>
>   <ab>th</ab>
>   <ab>e b</ab>
>   <ab>loo</ab>
>   <ab>d</ab>
>   <ab>Of</ab>
>   <ab>a</ab>
>   <ab>n E</ab>
>   <ab>ngl</ab>
>   <ab>ish</ab>
>   <ab>m</ab>
>   <ab>an</ab>
>...
></div>
><div type="textblock" xml:id="west.top" n="West North Column">
>...
></div>
>
>The divs are the hard bit, since tei:lb can always be transformed to the
>equivalent html:br. But if I want to reproduce the physical layout of
>the sections (as opposed to their metrical order), I'm going to have to
>identify the top, south, and north columns as separate divisions, as far
>as I can tell.
>
>Is there a way of doing this without escape-disable in XSL (or whatever
>it is called)? Is there a different way of encoding the original text to
>preserve information about the multiple hierarchies?


======================================================================
Wendell Piez                            mailto:[log in to unmask]
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================