Print

Print


On Wed, 2 May 2007, Patrik Nyman wrote:

> This isn't a specific TEI question, but some of the people on this list
> presumably have done this already, so here it goes.
>
> We're putting our texts into the eXist database. We want to be able
> to show one page of the original text at the time, so we'll be needing
> an XQuery expression to retrieve the specific portion of a text, namely
> between two specific <pb/>'s. Problems arise when the <pb/> fall in
> the middle of another element, say, a <p>. There might also be a
> <div>-break on the middle of the page, etc.
>
> If anyone has had any experience with this sort of thing, perhaps you've
> got some advise or pointers (or solutions!) to share?

At the bottom of this message is an XQuery function that will do this.
It could be translated to an XLST 2.0 function fairly easily if desired.

I'd upload it to the TEI Wiki, but need some feedback first: there is a
top level "Stylesheets" category that contains two subcategories, XSLT
and CSS. I could add an "XQuery" category but there's no such thing as
an "XQuery stylesheet". Should we rename "Stylesheets" to "Stylesheets
and Functions" or "Stylesheets and Program Code"? Sebastian?

=========

Explanation of attached milestone-chunk() function:

Given an ancestor element containing the two milestone elements bounding
the content you want to return, it returns an element that essentially
reproduces the tree structure of the input element but containing only
the nodal content between the two milestones. For example, suppose you
had a document that included the following content:

  <div2 xml:id="d1">
    <p>An example<pb n="3"/>of a <i>very</i> short page<pb n="4"/>here.</p>
  </div2>

Then if the two <pb> elements numbered 3 and 4 are passed to the
function, with /TEI/text as the ancestor node, the function would
return something like:

  <text>
     <body>
        <div1>
           <div2 xml:id="d1">
              <p><pb n="3"/>of a <i>very</i> short page</p>
           </div2>
        </div1>
     </body>
  </text>

(The first milestone element is included, but not the second, for
reasons too obscure to go into.)

[code follows, cut here: ]

(:  ================================================================================
Function: local:milestone-chunk()
Author:   David Sewell, [log in to unmask]
Version:  2007-05-02

Usage: The first two parameters are the starting and ending milestone elements
in a document or document fragment. Example: $doc//pb[@n='1'], $doc//pb[@n='2'].
The third parameter should be an element known to be a parent or ancestor of the
milestone elements, such as /tei:TEI/tei:body or /tei:TEI/tei:body/tei:text.
The node returned by the function will start with this element.

The function returns a single node containing all content between the two
milestones. To return, for example, the content of every page in a document as a
separate XML fragment, call the function repeatedly like so:

  local:milestone-chunk(//pb[@n='1'], //pb[@n='2'], /tei:TEI/tei:text)
  local:milestone-chunk(//pb[@n='2'], //pb[@n='3'], /tei:TEI/tei:text)
   . . .

(If there is no final milestone, as usually the case with <pb>, the second
argument can be a pseudo-milestone, for example the last node in your input
data. For example, suppose that the last page in your document is p. 454;
this function call should return its content:

  local:milestone-chunk(
    //pb[@n='454'],
    (/tei:TEI/tei:text//node())[last()],
    //tei:TEI/tei:text
   )

That is, it returns content between <pb n="454"/> and the final node in
your source.

 ============================================================================== :)
declare function local:milestone-chunk(
  $ms1 as node(),
  $ms2 as node(),
  $node as node()*
) as node()*
{
  if ($node instance of element() and local-name($node) = (local-name($ms1), local-name($ms2)))
  then (
      if ( $node is $ms1 ) then $node
      else ()
  )
  else if ($node instance of element() ) then
      if ( some $n in $node/descendant-or-self::* satisfies ($n is $ms1 or $n is $ms2) )
      then
          element { name($node) }
          { for $i in ( $node/node() | $node/@* ) return local:milestone-chunk($ms1, $ms2, $i) }
      else if ( $node >> $ms1 and $node << $ms2 ) then $node
      else ()
  else if ($node instance of text()) then
      if ( $node >> $ms1 and $node << $ms2 ) then $node
      else ()
  else if ($node instance of attribute()) then
    attribute { name($node) } { data($node)  }
  else ()
};

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [log in to unmask]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/