LISTSERV 16.5 - TEI-SOM Archives

Subscriber's Corner
Email Lists
TEI-SOM Archives

TEI-SOM@LISTSERV.BROWN.EDU

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		TEI-SOM Home
		TEI-SOM September 2012
Subject:
Re: TEI XPointer scheme proposal
From:
Martin Holmes <[log in to unmask]>
Reply-To:
[log in to unmask]
Date:
Fri, 28 Sep 2012 08:29:17 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (232 lines)
Hi Hugh,

On 12-09-27 06:22 PM, Hugh Cayless wrote:

>> I should clarify my position here. I really don't think there's
>> anything wrong with the proposal at all; I just think that if we
>> have xpath2(), most ordinary use-cases will be covered, and many of
>> the other schemes are expressible in terms of XPath 2 or realizable
>> as XPath functions.
>
> I agree that (at least in an XSLT or XQuery context) all of these can
> be implemented using XPath functions. I do see a gap in coverage
> where XPath is incapable of addressing anything below the node level
> (e.g. a text substring or text spanning elements)[1]. And this is
> where I think XPointers can be useful.

I've been puzzling over this for a long time, and I still don't know 
exactly what this means. If we think of retrieving a node -- say an 
element -- but only a subset of its text content, then we're actually 
constructing a new node with the same name, but with different children. 
In the simple case, for instance, extended from your example below, 
given this:

<title xml:id="t1">The Bible</title>

using this XPath: substring(//title, 1, 3)

we get an atomic value consisting of a string containing "The". Using a 
Pointer mechanism:

<seg corresp="#string-range(t1,0,3)"/>

we would get not only the text "The", but also, somehow, its entire 
context -- almost something like this:

<title xml:id="t1">The</title>

because somehow we could retrieve the fact that the original title 
element was the parent of the text node from which the range came.

However, in order to do this retrieval, the return value from the 
Pointer implementation must somehow encapsulate the original context. 
The example in the paper (using <unclear> etc.) seems to suggest two 
possibilities: one that a new fragment would be constructed, in which 
all elements from which partial content is included would be rebuilt in 
truncated form (as above); at the same time, the text of the paper 
suggests some doubt about whether this is advisable, or whether it 
should be so in some contexts (such as XInclude) but not in others, 
because it's "a deviation from the specification". It's not clear to me 
whether attributes on a truncated element should be returned or not, for 
instance, in contexts where the element is constructed.

But it's even more complicated than that. If what we're returning is a 
constructed reduced version of the parent node (<title> above), then we 
could easily decide that this includes all its attributes and other 
content captured by the range. However, what about its broader document 
context? If what we're doing is constructing a new node which is a 
reduced copy of the original, it's not "in place" in the document in any 
sense that would enable us to get its parent element, or its 
preceding-sibling. As you say below, what's important is to encapsulate 
information about the text node *in place*, but I don't see any way such 
a thing could be encapsulated in the return value from any actual 
implementation. An implementation can only return a value of some kind; 
that value may be as complex as a document fragment, but it can never be 
the entire document. In other words, I don't see how any implementation 
can simultaneously return a value which consists of the meaning 
expressed by the string-range, and the entire context from which all of 
its components have been extracted.

Then we have the broader issue here:

"TEI XPointer functions comprise a declarative mini-language. As such, 
they do not directly return values and cannot be executed. They address 
pieces of XML documents, but what is to be done with those pieces is 
wholly dependent on the context in which they are being interpreted."

What can it mean to address parts of a document without a mechanism for 
returning them? In order to make any use of the pointer, there must be a 
way to process it and get something; and if the something we get is 
wholly implementation-specific and not definitively explained (sometimes 
including truncated nodes and sometimes not, for instance), then the 
meaning of the Pointer itself is surely ambiguous. So I think we have to 
decide once and for all on what the return value of any given Pointer 
should be, as if it's a traditional function.

I apologise for the incoherence of this; it basically reflects my own 
puzzlement about exactly what can be done here.

Cheers,
Martin



>> I'm also really unsure what would be achieved by creating a
>> detailed specification for these functions, when we haven't (have
>> we?) come up with any realistic context in which those functions
>> might be realized. If every user is faced with implementing support
>> for these pointer schemes through their own code (by writing, for
>> instance, extension functions for Saxon), then I don't see them
>> being used very much, and the significant expansion in the
>> Guidelines text and examples which will be required to make them
>> comprehensible will be largely wasted.
>
> Here I agree 1000%. Without a reference implementation or so and
> examples, this is all totally useless. I plan on building at least an
> XSLT-based implementation as soon as it's clear enough what the
> target is :-)
>
> Best, Hugh
>
> [1] By this I mean that, while XPath can retrieve the value of a text
> substring, it cannot do so without losing the context of the source
> node. A function call like substring(//title, 1, 3) returns an atomic
> value, not a node in the document, so it isn't possible to use XPath
> to refer to parts of a text node *in place*.
>>
>> Your specification gives us a very sound starting point, and if we
>> are going to commit to building usable implementations based on
>> them, then I think the project is worthwhile, but if we're not
>> going to provide implementations of any kind, I think it would be
>> cleaner to limit ourselves to xpath2() along with some explanation
>> of how one might make use of an xpath2() pointer in real document
>> processing. And that's something I'm not really sure of myself, at
>> the moment.
>>
>> Cheers, Martin
>>
>>> I'd much rather sort problems out now rather than after the
>>> proposal has been submitted to the council.
>>>>
>>>> I have basically two comments:
>>>>
>>>> - I would keep the notion of default namespace generic and not
>>>> tied to the TEI one. We keep making sure that there is no magic
>>>> in the TEI infrastructure (cf. ODD discussion in Hamburg) and
>>>> we would not want to say here that TEI is intended when nothing
>>>> is said. Remember that my use case consists in using the
>>>> pointer spec. within another vocabulary (MAF)
>>>
>>> Yeah, I sympathize with this viewpoint, but on the other hand,
>>> that means an extra xmlns() for every single XPointer used in a
>>> TEI context that employs an XPath—which seems like too much
>>> overhead to me. I wonder whether we couldn't settle on some sort
>>> of reasonable default that avoids "magic"…maybe the default
>>> namespace for the pointer == the default namespace of the
>>> containing document (if there is one)?
>>>>
>>>> -as the spec moves forward, we should constantly have a couple
>>>> of examples which are expressed according to the latest version
>>>> we have. I would definitely think that the MAF example I gave
>>>> the other day could be part of this:
>>>>
>>>> <s xml:id="s1">The victim's friends</s> <w
>>>> corresp="#string-range(s1,0,3)"/> <w
>>>> corresp="#string-range(s1,4,6)"/> <w
>>>> corresp="#string-range(s1,10,2)"/> <w
>>>> corresp="#string-range(s1,13,7)"/>
>>>
>>> Absolutely. I'm starting to work on a test/implementation
>>> framework that we can use to track the development of the spec
>>> and plug these kinds of examples into.
>>>
>>> All the best, Hugh
>>>
>>>>
>>>> Cheers, Laurent
>>>>
>>>>
>>>>
>>>>
>>>> Le 25 sept. 2012 à 19:51, Gabriel Bodard a écrit :
>>>>
>>>>> Dear all,
>>>>>
>>>>> Response to my last email on this subject having been
>>>>> deafening in its silence, we didn't get much further in
>>>>> discussion of Hugh's XPointer proposal
>>>>> <http://docs.google.com/document/d/1JsMA-gOGrevyY-crzHGiC7eZ8XdV5H_wFTlUGzrf20w/edit>
>>>>>
>>>>>
at the TEI Council meeting last week. There is an action on me in
>>>>> the minutes of that meeting to push this list to discuss the
>>>>> proposal further.
>>>>>
>>>>> I understand that both Martin and Piotr had spotted some
>>>>> factual inconsistencies in the proposal, and/or had concrete
>>>>> suggestions for ways that a simpler (or existing) scheme
>>>>> could address the use-cases suggested (especially is xpath2()
>>>>> is available. As a matter of priority we need specific
>>>>> feedback from both of them on which cases they're talking
>>>>> about, highlight issues and suggestions for
>>>>> improvement/rationalization, so Hugh and others have
>>>>> something to respond to.
>>>>>
>>>>> If it turns out that everything we can imagine wanting to
>>>>> do--including Stuart's use-cases, Hugh's Papyri examples,
>>>>> and Laurent's projects--is logically addressable using
>>>>> existing XPointer schemes, well that would be great! But we
>>>>> still need to write an actionable spec of those schemes so
>>>>> that they can be properly implemented, right? So this
>>>>> discussion is still necessary.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Gabby
>>>>>
>>>>> -- Dr Gabriel BODARD (Research Associate in Digital
>>>>> Epigraphy)
>>>>>
>>>>> Department of Digital Humanities King's College London 26-29
>>>>> Drury Lane London WC2B 5RL
>>>>>
>>>>> Email: [log in to unmask] Tel: +44 (0)20 7848 1388 Fax:
>>>>> +44 (0)20 7848 2980
>>>>>
>>>>> http://www.digitalclassicist.org/
>>>>> http://www.currentepigraphy.org/
>>>>
>>>> Laurent Romary INRIA & HUB-IDSL [log in to unmask]
>>> .
>>>
>>
>> -- Martin Holmes University of Victoria Humanities Computing and
>> Media Centre ([log in to unmask])
> .
>

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
([log in to unmask])
Top of Message | Previous Page | Permalink
Search Archives

Advanced Options
Options

		Log In
		Get Password

		Search Archives

		Subscribe or Unsubscribe