Thanks again, Wendell (and everybody else).
Note: people tinkering with this problem may also be interested in the
thread "structure and page breaks" in our archives, which discusses
a similar problem and approaches in both xQuery and XSLT.
On Thu, Aug 26, 2010 at 7:07 PM, Lou Burnard <[log in to unmask]> wrote:
> This is all very encouraging, and I can foresee plenty of applications for
> it, especially as there's a proposal to make all milestone like elements
> (Syd says they're not milestones, but they are jolly *like* milestones)
> spanning... which means potentially lots of things like
> <milestone unit="whatever" spanTo="#overthere/>
> On 26/08/10 23:07, Wendell Piez wrote:
>> Dear Markus,
>> I am putting this thread back on TEI-L as there
>> will be interested parties there.
>> At 10:39 AM 8/26/2010, you wrote (to me, off list):
>>> The anticipated use case looks something like this: some of the Papers
>>> projects are now starting to add corrections to texts in the digital
>>> edition based on new copy texts accessioned post-print-publication.
>>> Assume a recipient's copy is found (superceding a printed copy as
>>> published in the letterpress edition). Contrived example:
>>> <div type="letter">
>>> <p>Blah…<delSpan spanTo="001"/>No way I ever wrote that.</p>
>>> <closer><signed>Tho: Jefferson</signed></closer><anchor xml:id="001"/>
>>> I.e., last sentence and closer from printed copy don't occur in MS
>>> recipient's copy.
>> Given the limitations of XML, this seems reasonable.
>>> Anyways, the projects I'm currently involved in here at Rotunda are
>>> fairly straightforward, and for now, at the HTML end of things, I
>>> think I might be able to get away with abusively rendering delSpans as
>>> blocks with a certain behavior (plus some kind of content
>>> repair/tidying function).
>>> But I am very grateful for your advice and pointers—good to know where
>>> to start looking once I get to work on an editorially more complex
>>> project. (One of those might be just around the corner, actually.)
>> Collectively, I think users of XML/XSLT are now
>> figuring out how to get a handle on these issues.
>> But having a technique isn't by itself enough. We
>> also have to keep an eye on performance and on
>> developing the techniques into generic routines
>> to minimize both the analysis and coding we have to do with every new
>> Nevertheless, XSLT 2.0 gives us some pretty
>> strong resources. In your case, for example, we
>> could write a function to return a delSpan
>> element from any text node that appears "inside"
>> it. (And a corresponding function for addSpan.)
>> Given a definition of "inside" for these purposes
>> (I'll say a node is "inside" a delSpan if the
>> delSpan milestone appears before the node starts
>> and its end anchor appears after the node ends),
>> it might not be so bad, even if the traversals
>> involved are potentially expensive.
>> To wit, something like:
>> <xsl:function name="v:fetch-delSpan" as="element(delSpan)?"/>
>> <xsl:param name="n" as="node()"/>
>> <!-- del will be the most recent delSpan milestone -->
>> <xsl:variable name="del" select="$n/preceding::delSpan"/>
>> <!-- $del/id(@spanTo) will be its end anchor -->
>> <!-- return $del if its end anchor appears after the argument node -->
>> <xsl:sequence select="$del[id(@spanTo)>> $n]"/>
>> Call v:fetch-delSpan on any node: if the node (as
>> a whole) is "inside" a delSpan, you'd get the
>> delSpan milestone back. If not, you'd get an empty node set back.
>> This enables something like this:
>> <xsl:template match="text()[exists(v:fetch-delSpan(.))]">
>> <span class="deleted">
>> ... and you're almost there.
>> How would it perform? I don't know. It would
>> depend on your processor and data set.
>> Wendell Piez mailto:[log in to unmask]
>> Mulberry Technologies, Inc. http://www.mulberrytech.com
>> 17 West Jefferson Street Direct Phone: 301/315-9635
>> Suite 207 Phone: 301/315-9631
>> Rockville, MD 20850 Fax: 301/315-8285
>> Mulberry Technologies: A Consultancy Specializing in SGML and XML