You are restricted to using XSLT 1.0?
If using XSLT 2.0, you should use the
xsl:for-each-group construct, as Torsten has suggested.
If restricted to XSLT 1.0, this is still doable.
Jeni's documentation of common grouping idioms is
useful, but you'll probably have to adapt the
code there, which of course means disentangling exactly what it does.
And of course, you don't exactly need grouping.
What you need is closely related to it, since
grouping also involves de-duplication (and frequently also sorting).
To pull a single de-duplicated list, the common
way to do it is with a key. (It can be done
without the key, but performance degrades
quadratically without it, for reasons I hope
you'll see. Incidentally, improving performance
by means of a key was what Steve Muench figured
out, earning him semi-immortality with the label "Muenchian grouping".)
Let's say you have a pile of <foreign lang="en">
elements scattered throughout your document, and
you want just a de-duplicated subset of them. You
are de-duplicating them based on their values.
(This is important. If you have some other
criterion, all the following would be adapted
thereto. For example, a particular attribute value could be used instead.)
<xsl:key name="en-by-value" match="foreign[@lang='en']" use="."/>
sets up a key retrieval mechanism whereby, given
a value, you can retrieve all foreign[@lang='en'] elements with that value.
So key('en-by-value','Boo') will retrieve all
foreign elements with @lang='en' and the value
'Boo': <foreign lang="en">Boo</foreign>
Thus, given any particular foreign[@lang='en']
element, you can retrieve all of its occurrences using "key('en-by-value',.)".
And given any of these sets, you can grab just
one -- let's say the first, since it's always
there -- as "key('en-by-value',.)".
The next bit is a bit tricky, due to the fact
that in XPath 1.0 there is no obvious way to see
if two given nodes are the same. There are a
couple of non-obvious ways. The easiest for our
purposes here is, if we have a node $a and a node
$b, generate-id($a)=generate-id($b) is true only
if they are the same node. (In XPath 2.0 you
would test this with "$a is $b", which is much nicer.)
This is useful since it enables us to write
as a way of grabbing just single occurrences of
foreign[@lang='en'], de-duplicated by their values.
It translates as "Please get me all the 'foreign'
elements whose 'lang' attribute is 'en' and whose
unique generated ID is the same as the unique
generated ID of the first element, in document
order, of the set of foreign elements with
lang='en' that have the same value as me".
If you can grab this set, you can process it -- and sort it while doing so:
... do your worst ...
In XSLT 2.0, the same can be done with:
<xsl:for-each-group select="//foreign[@lang='en']" group-by=".">
<xsl:sort lang="'en'" select="current-grouping-key()"/>
... in here, current-grouping-key() is the value
and "current-group()" or "." is the first node in the group ...
which is somewhat more flexible and does not
require expert understanding of XPath 1.0 idioms to interpret.
I hope this helps.
As to having code already written -- this was the
other part of the headache in XSLT 1.0 (that is,
once understanding it had stopped hurting), since
the fact that the keys have to be wired up for
particular cases (of nodes to be selected and
their de-duplicating criteria) means that the
code can't be easily generalized and packaged.
This, in turn, combined with the ubiquity of the
requirement, is what accounts for the better
features in version 2.0 of the language.
Defenders of XSLT 1.0 will point out that version
1 of the language was probably going to have a
hole deeper than any of the others; this just happens to be it.
At 07:48 PM 9/22/2007, you wrote:
>A TEI related XSLT question: I'm hoping somebody has code already
>A project I am consulting on involves a text full of Blackfoot words
>that have been spelled phonetically using an unusual spelling system.
>The investigator wants to give the text to an informant with an appendix
>containing a list of all the Blackfoot words in the text, preferably
>followed with links to the original location of the lemma. In some cases
>the same spelling (identical or differing only in the use of case)
>appears tens of times. The informant is going to provide normalisations
>for the spellings in the text, which we will be adding to the encoding
>along with the original.
>The unusual Blackfoot spellings are currently marked up with a bare
>tei:foreign. So what I want to do is
>a) extract all the blackfoot words
>b) sort them alphabetically and eliminate duplicates
>c) place at the back of the document. If I can do this, I'd be happy.
>Even better would be to follow the lemma with a link to the actual
>occurrences in the text (e.g. in HTML terms <p>NA PE §§ <a
>href="#s1.1">1.1</a>, <a href="#s1.4">1.4</a>, <a href="#s3.3">3.3</a>).
>At this point, though, even a simple sorted list with no duplicates
>I can do a) and c). I've been experimenting accomplishing b) using the
>various ways of grouping listed at Jenni Tennison's site, but don't seem
>to get one to work. Has anybody got some plug and play code or tips I
>might be able to use?
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML