Print

Print


Kevin,

I’ve just posted the code, and it gave me the opportunity to rectify a
quote that didn’t come through.
http://wiki.tei-c.org/index.php/Unicode_normalization

The structure you’ve set up is very nice, and I’ll post more Schematron +
sqf that I think might be helpful. To get things kicked off, here’s one
that will insert today’s date or dateTime (your choice) in ISO-compliant
format. This is intended for TEI users who keep an active <change> log and
get tired of entering the current date or date and time.
http://wiki.tei-c.org/index.php/Set_when-iso_to_today

Best wishes,

jk


On 5/14/15, 10:04 AM, "Kevin Hawkins" <[log in to unmask]>
wrote:

>I've just created a category on the TEI wiki for user-contributed
>Schematron rules:
>
>http://wiki.tei-c.org/index.php/Category:Schematron
>
>In this category, I've created a page that references Joel's post to
>TEI-L.  A copy in the wiki directly would be better, but since I don't
>own the copyright in the code, I can't put it there.  However, I
>encourage Joel to contribute it directly so that others can improve it
>directly in the wiki!
>
>--Kevin
>
>On 5/13/15 12:41 PM, Kalvesmaki, Joel wrote:
>> Jens, my working code is below. Hope this saves you (and others) some
>>time
>> hunting through the Unicode charts.
>> jk
>>
>> ====Schematron rule (with companion xsl function) to locate and
>>identify all
>> non-normalized Unicode characters, and to offer a quick fix to normalize
>> it. Code must be part of a valid Schematron file. The prefix sqf must be
>> bound to the namespace
>> http://www.schematron-quickfix.com/validator/process. The prefix func
>>can
>> be bound to any namespace.
>>
>>     <rule context=ext()">
>>        <let name=his-raw-char-seq" value="tokenize(replace(.,'(.)','$1
>> '),' ')"/>
>>        <let name=his-nfc-char-seq"
>> value=okenize(replace(normalize-unicode(.),'(.)','$1 '),' ')"/>
>>        <let name=his-non-nfc-seq"
>>
>> value=istinct-values($this-raw-char-seq[not(.=$this-nfc-char-seq)])"/>
>>        <assert test= = normalize-unicode(.)"
>> sqf:fix=ormalize-unicode">All text needs to be
>>           normalized (NFC). Errors: <value-of
>>              select=or $i in $this-non-nfc-seq return concat($i,' (U+',
>>              func:dec-to-hex(string-to-codepoints($i)),') at ',
>>              string-join(for $j in index-of($this-raw-char-seq,$i)
>>return
>> string($j),' ')),' '"
>>           /></assert>
>>        <sqf:fix id=ormalize-unicode">
>>           <sqf:description>
>>              <sqf:title>Convert to normalized (NFC) Unicode</sqf:title>
>>           </sqf:description>
>>           <sqf:stringReplace match=" regex=".+"><value-of
>> select=ormalize-unicode(.)"
>>              /></sqf:stringReplace>
>>        </sqf:fix>
>>     </rule>
>>
>>     <xsl:function name=unc:dec-to-hex" as="xs:string"
>>        xmlns:xsl=ttp://www.w3.org/1999/XSL/Transform">
>>        <!-- Input: Integer. Output: Hexadecimal equivalent string. -->
>>        <xsl:param name=n" as="xs:integer"/>
>>        <xsl:sequence
>>           select=f ($in eq 0)
>>           then '0'
>>           else
>>           concat(if ($in gt 16)
>>           then func:dec-to-hex($in idiv 16)
>>           else '',
>>           substring('0123456789ABCDEF',
>>           ($in mod 16) + 1, 1))"
>>        />
>>     </xsl:function>
>>
>>
>>
>> From:  Jens Østergaard Petersen <[log in to unmask]>
>> Date:  Wed, 13 May 2015 08:10:54 +0200
>> To:  <[log in to unmask]>, <Kalvesmaki>, Joel
>><[log in to unmask]>
>> Cc:  <[log in to unmask]>
>> Subject:  Re: oXygen support for Schematron Quick Fixes
>>
>>
>> This sounds very interesting. Could you publish your QuickFixes for
>> normalising Unicode?
>>
>> In this connection note also that oXygen 17 has added the possibility to
>> search for canonically equivalent strings. This allows one to search for
>> precomposed and decomposed characters at the same time, but as far as I
>> can see, it does not include compatibility distinctions, so (in the
>>terms
>> of our earlier discussion), it works with ³Åström²/³Åström², but not
>>with
>> ³woffle"/³woffle².
>>
>> Jens
>>
>> On 13 May 2015 at 02:16:13, Kalvesmaki, Joel ([log in to unmask])
>>wrote:
>>
>> TEI community,
>>
>>
>> Since it hasn¹t yet been mentioned, I thought it worthwhile to highly
>> recommend oXygen 17¹s new feature providing support for Schematron Quick
>> Fixes (http://www.schematron-quickfix.com).
>>
>>
>> Some of you may recall an earlier discussion on normalizing Unicode, and
>> the Schematron pattern I offered. That was fine insofar as it identified
>> and located the problem, but it offered no fixes. SQF does just that.
>> Tonight I put together a very simple but powerful SQF that allows a user
>> with two mouse clicks in oXygen to change the errant text of an element
>> into normalized Unicode. I wrote three more SQF patterns to fix editing
>> that was previously took around a minute per change (to look up a value,
>> copy, return where I was originally, and paste it). I think the
>>potential
>> benefit to TEI projects, especially in communicating choices and options
>> to project participants, is quite impressive.
>>
>>
>> Kudos to Syncro Soft! (See their demo video here:
>> http://www.oxygenxml.com/demo/Schematron_Quick_Fixes.html )
>>
>>
>> Best wishes,
>>
>>
>> jk
>>
>> --
>>
>> Joel Kalvesmaki
>>
>> Editor in Byzantine Studies
>>
>> Dumbarton Oaks
>>
>> 202 339 6435
>>