Print

Print


Hi Syd,

About your small examples:

Indeed with xml version 1.1 the XML is reported as wellformed but the 
Oxygen syntax highlight shows the character as invalid. We'll try to fix 
this on our side.

With XML 1.0 it seems that the Xerces XML parser indeed reports the same 
construct as not wellformed. Indeed the specs seems to say that the 
[#x10000-#xEFFFF] range should be considered as valid XML tag start 
chars. And I also cannot find that character in the reserved char ranges:

> http://www.w3.org/TR/REC-xml/#charsets

so I'm not sure why the Xerces parser we use for validation reports the 
character as invalid.

Regards,
Radu

Radu Coravu
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 11/2/2015 1:26 PM, Syd Bauman wrote:
> Ha!
>
> Thank you, Peter. Despite the fact that I'm worn, buried, and
> jet-lagged you got a big grin out of me on that one.
>
> But another good question, is why does oXygen[1] consider
>
> | <?xml version="1.0" encoding="UTF-8"?>
> | <😐/>
>
> to be ill-formed, even though U+1F610 (what that emoji is, in case it
> doesn't make it through mail gateways) is within the last range
> specified as a Namestart character by XML 1.0, production 4?[2]
>
> Note that oXygen correctly says that
>
> | <?xml version="1.1" encoding="UTF-8"?>
> | <😐/>
>
> is well-formed, but incorrectly colors from emoji on as red.
>
> Notes
> -----
> [1] XML Developer 17.0, build 2015051321 running on GNU/Linux
> [2] http://www.w3.org/TR/REC-xml/#NT-NameStartChar
>
> Peter Flynn writes:
>
>
>> At least rxp considers this well-formed :-)
>>
>> <TEI.42>
>>    <😐 TEIform="teiHeader">
>>      <🗃 TEIform="fileDesc">
>>        <📛💭 TEIform="titleStmt">
>> 	<📛 TEIform="title">TEI using Emojis</📛>
>>        </📛💭>
>>        <đŸ“ĸ TEIform="publicationStmt">
>>          <đŸ‡ĩ🇾 TEIform="p">Demo</đŸ‡ĩ🇾>
>>        </đŸ“ĸ>
>>        <ℹī¸ TEIform="sourceDesc">
>>          <đŸ‡ĩ🇾 TEIform="p">Hand-made</đŸ‡ĩ🇾>
>>        </ℹī¸>
>>      </🗃>
>>    </😐>
>>    <🖨 TEIform="text">
>>      <đŸ’Ē TEIform="body">
>>        <🗂 TEIform="div">
>> 	<đŸ—Ŗ TEIform="head">XML with Emojis in element type names</đŸ—Ŗ>
>> 	<đŸ‡ĩ🇾 TEIform="p">Para(guay)</đŸ‡ĩ🇾>
>>        </🗂>
>>      </đŸ’Ē>
>>    </🖨>
>> </TEI.42>
>