Print

Print


Dear Markus,

I am putting this thread back on TEI-L as there 
will be interested parties there.

At 10:39 AM 8/26/2010, you wrote (to me, off list):
>The anticipated use case looks something like this: some of the Papers
>projects are now starting to add corrections to texts in the digital
>edition based on new copy texts accessioned post-print-publication.
>Assume a recipient's copy is found (superceding a printed copy as
>published in the letterpress edition). Contrived example:
>
><div type="letter">
>...
><p>Blah… <delSpan spanTo="001"/>No way I ever wrote that.</p>
><closer><signed>Tho: Jefferson</signed></closer><anchor xml:id="001"/>
></div>
>
>I.e., last sentence and closer from printed copy don't occur in MS
>recipient's copy.

Given the limitations of XML, this seems reasonable.

>Anyways, the projects I'm currently involved in here at Rotunda are
>fairly straightforward, and for now, at the HTML end of things, I
>think I might be able to get away with abusively rendering delSpans as
>blocks with a certain behavior (plus some kind of content
>repair/tidying function).

Okay.

>But I am very grateful for your advice and pointers—good to know where
>to start looking once I get to work on an editorially more complex
>project. (One of those might be just around the corner, actually.)

Collectively, I think users of XML/XSLT are now 
figuring out how to get a handle on these issues. 
But having a technique isn't by itself enough. We 
also have to keep an eye on performance and on 
developing the techniques into generic routines 
to minimize both the analysis and coding we have to do with every new case.

Nevertheless, XSLT 2.0 gives us some pretty 
strong resources. In your case, for example, we 
could write a function to return a delSpan 
element from any text node that appears "inside" 
it. (And a corresponding function for addSpan.) 
Given a definition of "inside" for these purposes 
(I'll say a node is "inside" a delSpan if the 
delSpan milestone appears before the node starts 
and its end anchor appears after the node ends), 
it might not be so bad, even if the traversals 
involved are potentially expensive.

To wit, something like:

<xsl:function name="v:fetch-delSpan" as="element(delSpan)?"/>
   <xsl:param name="n" as="node()"/>
   <!-- del will be the most recent delSpan milestone -->
   <xsl:variable name="del" select="$n/preceding::delSpan[1]"/>
   <!-- $del/id(@spanTo) will be its end anchor -->
   <!-- return $del if its end anchor appears after the argument node -->
   <xsl:sequence select="$del[id(@spanTo) >> $n]"/>
</xsl:function>

Call v:fetch-delSpan on any node: if the node (as 
a whole) is "inside" a delSpan, you'd get the 
delSpan milestone back. If not, you'd get an empty node set back.

This enables something like this:

<xsl:template match="text()[exists(v:fetch-delSpan(.))]">
   <span class="deleted">
     <xsl:next-match/>
   </span>
</xsl:template>

... and you're almost there.

How would it perform? I don't know. It would 
depend on your processor and data set.

Cheers,
Wendell



======================================================================
Wendell Piez                            mailto:[log in to unmask]
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================