On 15-02-19 02:41 PM, Jens Østergaard Petersen wrote:
> On 19 Feb 2015 at 22:40:36, Martin Holmes ([log in to unmask]
> <mailto:[log in to unmask]>) wrote:
>> Sorting depends on collations, and collations start with normalization:
> You can sort according to Unicode code point order and this does not
> involve any collation.
That _is_ a collation, surely; it's just not one that would be useful
for most human language contexts.
>> I think we have to be careful not to use "normalized" to mean one
>> specific normalization form, though, don't we? NFD, NFC, NFKD and NFKC
>> are all normalization forms.
> By "a normalized text" I would suggest we mean a text that completely
> conforms to one (or more) of the four normalization forms.
Ah, that makes sense.
> I notice that the standard Apple text editor TextEdit and the Mac word
> processor Nisus Writer Pro normalizes when searching while leaving input
> unnormalized, not just with accents, but also with ligatures. This shows
> that it is possible to have an app which searches for canonically
> equivalent strings in unnormalized input (what I dreamt about for oXygen
> and what Syd wanted).
It's a noble goal. But as someone pointed out, given any substantial
quantity of text, the combinatorial explosion of possibilities would be