Print

Print


I've just created a category on the TEI wiki for user-contributed 
Schematron rules:

http://wiki.tei-c.org/index.php/Category:Schematron

In this category, I've created a page that references Joel's post to 
TEI-L.  A copy in the wiki directly would be better, but since I don't 
own the copyright in the code, I can't put it there.  However, I 
encourage Joel to contribute it directly so that others can improve it 
directly in the wiki!

--Kevin

On 5/13/15 12:41 PM, Kalvesmaki, Joel wrote:
> Jens, my working code is below. Hope this saves you (and others) some time
> hunting through the Unicode charts.
> jk
>
> ====Schematron rule (with companion xsl function) to locate and identify all
> non-normalized Unicode characters, and to offer a quick fix to normalize
> it. Code must be part of a valid Schematron file. The prefix sqf must be
> bound to the namespace
> http://www.schematron-quickfix.com/validator/process. The prefix func can
> be bound to any namespace.
>
>     <rule context=ext()">
>        <let name=his-raw-char-seq" value="tokenize(replace(.,'(.)','$1
> '),' ')"/>
>        <let name=his-nfc-char-seq"
> value=okenize(replace(normalize-unicode(.),'(.)','$1 '),' ')"/>
>        <let name=his-non-nfc-seq"
>
> value=istinct-values($this-raw-char-seq[not(.=$this-nfc-char-seq)])"/>
>        <assert test= = normalize-unicode(.)"
> sqf:fix=ormalize-unicode">All text needs to be
>           normalized (NFC). Errors: <value-of
>              select=or $i in $this-non-nfc-seq return concat($i,' (U+',
>              func:dec-to-hex(string-to-codepoints($i)),') at ',
>              string-join(for $j in index-of($this-raw-char-seq,$i) return
> string($j),' ')),' '"
>           /></assert>
>        <sqf:fix id=ormalize-unicode">
>           <sqf:description>
>              <sqf:title>Convert to normalized (NFC) Unicode</sqf:title>
>           </sqf:description>
>           <sqf:stringReplace match=" regex=".+"><value-of
> select=ormalize-unicode(.)"
>              /></sqf:stringReplace>
>        </sqf:fix>
>     </rule>
>
>     <xsl:function name=unc:dec-to-hex" as="xs:string"
>        xmlns:xsl=ttp://www.w3.org/1999/XSL/Transform">
>        <!-- Input: Integer. Output: Hexadecimal equivalent string. -->
>        <xsl:param name=n" as="xs:integer"/>
>        <xsl:sequence
>           select=f ($in eq 0)
>           then '0'
>           else
>           concat(if ($in gt 16)
>           then func:dec-to-hex($in idiv 16)
>           else '',
>           substring('0123456789ABCDEF',
>           ($in mod 16) + 1, 1))"
>        />
>     </xsl:function>
>
>
>
> From:  Jens Østergaard Petersen <[log in to unmask]>
> Date:  Wed, 13 May 2015 08:10:54 +0200
> To:  <[log in to unmask]>, <Kalvesmaki>, Joel <[log in to unmask]>
> Cc:  <[log in to unmask]>
> Subject:  Re: oXygen support for Schematron Quick Fixes
>
>
> This sounds very interesting. Could you publish your QuickFixes for
> normalising Unicode?
>
> In this connection note also that oXygen 17 has added the possibility to
> search for canonically equivalent strings. This allows one to search for
> precomposed and decomposed characters at the same time, but as far as I
> can see, it does not include compatibility distinctions, so (in the terms
> of our earlier discussion), it works with ³Åström²/³Åström², but not with
> ³woffle"/³woffle².
>
> Jens
>
> On 13 May 2015 at 02:16:13, Kalvesmaki, Joel ([log in to unmask]) wrote:
>
> TEI community,
>
>
> Since it hasn¹t yet been mentioned, I thought it worthwhile to highly
> recommend oXygen 17¹s new feature providing support for Schematron Quick
> Fixes (http://www.schematron-quickfix.com).
>
>
> Some of you may recall an earlier discussion on normalizing Unicode, and
> the Schematron pattern I offered. That was fine insofar as it identified
> and located the problem, but it offered no fixes. SQF does just that.
> Tonight I put together a very simple but powerful SQF that allows a user
> with two mouse clicks in oXygen to change the errant text of an element
> into normalized Unicode. I wrote three more SQF patterns to fix editing
> that was previously took around a minute per change (to look up a value,
> copy, return where I was originally, and paste it). I think the potential
> benefit to TEI projects, especially in communicating choices and options
> to project participants, is quite impressive.
>
>
> Kudos to Syncro Soft! (See their demo video here:
> http://www.oxygenxml.com/demo/Schematron_Quick_Fixes.html )
>
>
> Best wishes,
>
>
> jk
>
> --
>
> Joel Kalvesmaki
>
> Editor in Byzantine Studies
>
> Dumbarton Oaks
>
> 202 339 6435
>