This is a fascinating, if gnarly problem, that has been around since
the dawn of XML. And some hold that this is really an XML problem,
not a TEI problem, since (obviously) DocBook and MODS and all sorts
of other XML vocabularies have the same problem.
That said, even though the issue was addressed in early drafts of
Canonicalization, the W3C (IIRC) explicitly dodged this bullet when
it set up the C14N format. (Hang on ... there it is: see
I just looked at that paragraph, and it does point out that the XPath
data model requires NFC *when the input is not UCS-based*. I bet 90+%
of TEI documents are UCS-based (e.g., UTF-8), though. Not sure what
this means for those documents.
Part of me thinks that search engines simply should know how to
handle this. They have an option for case-folding ("A" vs "a"), e.g.,
why not for pre-composition?
In any case, it is a worthy enough idea (IMHO) that it should
certainly be addressed, so I think you should put in a feature
request ticket for this. (If you don't want to fight with the
Sourceforge interface to do that, just say so, and I'll be happy to
put the ticket in for you.)