I'd like to thank all who've helped me out so far. An almost-due-grant
is keeping me from trying them out as they come in, but I hope to get
to them tomorrow.
As to XSLT 1.0/2.0: I have no preference. I'll probably be using php for
processing them so it depends on whether 2.0 is supported easily for
now. I'll look.
On Fri, 2007-28-09 at 15:00 -0400, Wendell Piez wrote:
> You are restricted to using XSLT 1.0?
> If using XSLT 2.0, you should use the
> xsl:for-each-group construct, as Torsten has suggested.
> If restricted to XSLT 1.0, this is still doable.
> Jeni's documentation of common grouping idioms is
> useful, but you'll probably have to adapt the
> code there, which of course means disentangling exactly what it does.
> And of course, you don't exactly need grouping.
> What you need is closely related to it, since
> grouping also involves de-duplication (and frequently also sorting).
> To pull a single de-duplicated list, the common
> way to do it is with a key. (It can be done
> without the key, but performance degrades
> quadratically without it, for reasons I hope
> you'll see. Incidentally, improving performance
> by means of a key was what Steve Muench figured
> out, earning him semi-immortality with the label "Muenchian grouping".)
> Let's say you have a pile of <foreign lang="en">
> elements scattered throughout your document, and
> you want just a de-duplicated subset of them. You
> are de-duplicating them based on their values.
> (This is important. If you have some other
> criterion, all the following would be adapted
> thereto. For example, a particular attribute value could be used instead.)
> <xsl:key name="en-by-value" match="foreign[@lang='en']" use="."/>
> sets up a key retrieval mechanism whereby, given
> a value, you can retrieve all foreign[@lang='en'] elements with that value.
> So key('en-by-value','Boo') will retrieve all
> foreign elements with @lang='en' and the value
> 'Boo': <foreign lang="en">Boo</foreign>
> Thus, given any particular foreign[@lang='en']
> element, you can retrieve all of its occurrences using "key('en-by-value',.)".
> And given any of these sets, you can grab just
> one -- let's say the first, since it's always
> there -- as "key('en-by-value',.)".
> The next bit is a bit tricky, due to the fact
> that in XPath 1.0 there is no obvious way to see
> if two given nodes are the same. There are a
> couple of non-obvious ways. The easiest for our
> purposes here is, if we have a node $a and a node
> $b, generate-id($a)=generate-id($b) is true only
> if they are the same node. (In XPath 2.0 you
> would test this with "$a is $b", which is much nicer.)
> This is useful since it enables us to write
> as a way of grabbing just single occurrences of
> foreign[@lang='en'], de-duplicated by their values.
> It translates as "Please get me all the 'foreign'
> elements whose 'lang' attribute is 'en' and whose
> unique generated ID is the same as the unique
> generated ID of the first element, in document
> order, of the set of foreign elements with
> lang='en' that have the same value as me".
> If you can grab this set, you can process it -- and sort it while doing so:
> <xsl:sort data-type="text"/>
> ... do your worst ...
> In XSLT 2.0, the same can be done with:
> <xsl:for-each-group select="//foreign[@lang='en']" group-by=".">
> <xsl:sort lang="'en'" select="current-grouping-key()"/>
> ... in here, current-grouping-key() is the value
> and "current-group()" or "." is the first node in the group ...
> which is somewhat more flexible and does not
> require expert understanding of XPath 1.0 idioms to interpret.
> I hope this helps.
> As to having code already written -- this was the
> other part of the headache in XSLT 1.0 (that is,
> once understanding it had stopped hurting), since
> the fact that the keys have to be wired up for
> particular cases (of nodes to be selected and
> their de-duplicating criteria) means that the
> code can't be easily generalized and packaged.
> This, in turn, combined with the ubiquity of the
> requirement, is what accounts for the better
> features in version 2.0 of the language.
> Defenders of XSLT 1.0 will point out that version
> 1 of the language was probably going to have a
> hole deeper than any of the others; this just happens to be it.
> At 07:48 PM 9/22/2007, you wrote:
> >Hi all,
> >A TEI related XSLT question: I'm hoping somebody has code already
> >A project I am consulting on involves a text full of Blackfoot words
> >that have been spelled phonetically using an unusual spelling system.
> >The investigator wants to give the text to an informant with an appendix
> >containing a list of all the Blackfoot words in the text, preferably
> >followed with links to the original location of the lemma. In some cases
> >the same spelling (identical or differing only in the use of case)
> >appears tens of times. The informant is going to provide normalisations
> >for the spellings in the text, which we will be adding to the encoding
> >along with the original.
> >The unusual Blackfoot spellings are currently marked up with a bare
> >tei:foreign. So what I want to do is
> >a) extract all the blackfoot words
> >b) sort them alphabetically and eliminate duplicates
> >c) place at the back of the document. If I can do this, I'd be happy.
> >Even better would be to follow the lemma with a link to the actual
> >occurrences in the text (e.g. in HTML terms <p>NA PE §§ <a
> >href="#s1.1">1.1</a>, <a href="#s1.4">1.4</a>, <a href="#s3.3">3.3</a>).
> >At this point, though, even a simple sorted list with no duplicates
> >would do.
> >I can do a) and c). I've been experimenting accomplishing b) using the
> >various ways of grouping listed at Jenni Tennison's site, but don't seem
> >to get one to work. Has anybody got some plug and play code or tips I
> >might be able to use?
> Wendell Piez mailto:[log in to unmask]
> Mulberry Technologies, Inc. http://www.mulberrytech.com
> 17 West Jefferson Street Direct Phone: 301/315-9635
> Suite 207 Phone: 301/315-9631
> Rockville, MD 20850 Fax: 301/315-8285
> Mulberry Technologies: A Consultancy Specializing in SGML and XML
Daniel Paul O'Donnell, PhD
Chair, Text Encoding Initiative <http://www.tei-c.org/>
Director, Digital Medievalist Project <http://www.digitalmedievalist.org/>
Associate Professor and Chair of English
University of Lethbridge
Lethbridge AB T1K 3M4
Vox: +1 403 329 2378
Fax: +1 403 382-7191