I've just created a category on the TEI wiki for user-contributed
In this category, I've created a page that references Joel's post to
TEI-L. A copy in the wiki directly would be better, but since I don't
own the copyright in the code, I can't put it there. However, I
encourage Joel to contribute it directly so that others can improve it
directly in the wiki!
On 5/13/15 12:41 PM, Kalvesmaki, Joel wrote:
> Jens, my working code is below. Hope this saves you (and others) some time
> hunting through the Unicode charts.
> ====Schematron rule (with companion xsl function) to locate and identify all
> non-normalized Unicode characters, and to offer a quick fix to normalize
> it. Code must be part of a valid Schematron file. The prefix sqf must be
> bound to the namespace
> http://www.schematron-quickfix.com/validator/process. The prefix func can
> be bound to any namespace.
> <rule context=ext()">
> <let name=his-raw-char-seq" value="tokenize(replace(.,'(.)','$1
> '),' ')"/>
> <let name=his-nfc-char-seq"
> value=okenize(replace(normalize-unicode(.),'(.)','$1 '),' ')"/>
> <let name=his-non-nfc-seq"
> <assert test= = normalize-unicode(.)"
> sqf:fix=ormalize-unicode">All text needs to be
> normalized (NFC). Errors: <value-of
> select=or $i in $this-non-nfc-seq return concat($i,' (U+',
> func:dec-to-hex(string-to-codepoints($i)),') at ',
> string-join(for $j in index-of($this-raw-char-seq,$i) return
> string($j),' ')),' '"
> <sqf:fix id=ormalize-unicode">
> <sqf:title>Convert to normalized (NFC) Unicode</sqf:title>
> <sqf:stringReplace match=" regex=".+"><value-of
> <xsl:function name=unc:dec-to-hex" as="xs:string"
> <!-- Input: Integer. Output: Hexadecimal equivalent string. -->
> <xsl:param name=n" as="xs:integer"/>
> select=f ($in eq 0)
> then '0'
> concat(if ($in gt 16)
> then func:dec-to-hex($in idiv 16)
> else '',
> ($in mod 16) + 1, 1))"
> From: Jens Østergaard Petersen <[log in to unmask]>
> Date: Wed, 13 May 2015 08:10:54 +0200
> To: <[log in to unmask]>, <Kalvesmaki>, Joel <[log in to unmask]>
> Cc: <[log in to unmask]>
> Subject: Re: oXygen support for Schematron Quick Fixes
> This sounds very interesting. Could you publish your QuickFixes for
> normalising Unicode?
> In this connection note also that oXygen 17 has added the possibility to
> search for canonically equivalent strings. This allows one to search for
> precomposed and decomposed characters at the same time, but as far as I
> can see, it does not include compatibility distinctions, so (in the terms
> of our earlier discussion), it works with ³Åström²/³Åström², but not with
> On 13 May 2015 at 02:16:13, Kalvesmaki, Joel ([log in to unmask]) wrote:
> TEI community,
> Since it hasn¹t yet been mentioned, I thought it worthwhile to highly
> recommend oXygen 17¹s new feature providing support for Schematron Quick
> Fixes (http://www.schematron-quickfix.com).
> Some of you may recall an earlier discussion on normalizing Unicode, and
> the Schematron pattern I offered. That was fine insofar as it identified
> and located the problem, but it offered no fixes. SQF does just that.
> Tonight I put together a very simple but powerful SQF that allows a user
> with two mouse clicks in oXygen to change the errant text of an element
> into normalized Unicode. I wrote three more SQF patterns to fix editing
> that was previously took around a minute per change (to look up a value,
> copy, return where I was originally, and paste it). I think the potential
> benefit to TEI projects, especially in communicating choices and options
> to project participants, is quite impressive.
> Kudos to Syncro Soft! (See their demo video here:
> http://www.oxygenxml.com/demo/Schematron_Quick_Fixes.html )
> Best wishes,
> Joel Kalvesmaki
> Editor in Byzantine Studies
> Dumbarton Oaks
> 202 339 6435