On Mon, 20 Mar 1995 09:16:28 CST Patrick Durusau said:
>I am working on a entity set for use with a Hebrew manuscript and wanted
>to check my understanding of WSD semantics in the following instance:
>The WSD for ISO88589 defines the Hebrew letter PE as follows:
><form string='ô' entityStd="fehhb" ucs-4="05E4" afiicode="E154">
><desc>HEBREW LETTER PE</desc></form></character>
>I am concerned with creating an entity for the Hebrew letter PE which
>is used as an abbreviation for Petuha (open paragraph) in Hebrew Bible
>manuscripts. If this instance of PE is treated as having the character
>class value of punc, does merger occur between the local defined
>exception and the WSD base?
An excellent question. If I read the chapter right, no merger will take
place, because class=lexical and class=punc are inherently incompatible.
What actually does happen (or should happen, in properly WSD-aware
software) will vary depending on exactly how you invoke things.
If you invoke the ISO 8859-8 character-set WSD and the WSD documenting
your entity set both as base WSDs, and if your entity-set WSD gives the
same entity name or expansion string ('ô') as the ISO 8859-8 WSD,
then an error should be signaled, because the base WSDs are incompatible
(they classify the same character differently) and should not be invoked
If you invoke them both as base WSDs, but your special entity set gives
a different expansion string and entity name for the open-paragraph use
of PE, then the two WSDs are compatible, and the two uses of PE will be
treated, formally, as two distinct characters (one lexical, one
If you invoke the ISO 8859-8 material as a base WSD, and define
open-paragraph PE as an EXCEPTION, then your declaration for PE will
override that given in the base WSD, if you give the same entity name or
literal string; it will create a new CHARACTER element if you don't. In
either case, no merger.
If you don't want to use a distinct entity name and expansion for the
punctuating use of PE, I think you will be within the bounds of good
practice to use the standard PE: the fact is that in many scripts,
normal lexical characters sometimes are used with non-lexical functions.
The use of x, i, v, etc. in Roman numerals, the use of Greek (and
Hebrew) characters in numbers, and so on, are commonly accepted examples.
The use of the string 'Chapter', followed by a number, to mark a
chapter division in a modern novel, might similarly be argued to be
fundamentally punctuation, rather than lexical content. (I won't
make that argument here: I merely observe that it could be made.)
So we can similarly allow PE to function as punctuation, even if
the WSD observes (correctly) that it's basically a lexical character.
If you regard the punctuating use as particularly important, of course,
you could also re-classify PE as lexpunc --- it's not quite the same as
the examples given there, but classifying it thus would get across
quite effectively the fact that the character is sometimes lexical and
I hope this helps a little.
-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
[log in to unmask] / u35395@uicvm
"Clarity, Precision and Ease of use does not mean Confinement, Verbosity
and Futility." -Jean Pierre Gaspart