Print

Print


----------------------------Original message----------------------------
 
   From: Erik Naggum <[log in to unmask]>
   Date: 02 Oct 1992 19:15:19 +0100 (19921002181519)
 
   Having done this, and having an application character set, or code
   sequences to accomplish a given glyph on the display device, it's
   trivial to produce a mapping by name lookup.  E.g., if  TeX is used as
   the processing back-end:
 
   Definition:
	   <!ENTITY oe SDATA "LATIN SMALL LETTER O WITH STROKE">
 
   Local mapping:
	   "LATIN SMALL LETTER O WITH STROKE" = "\{o}"
 
   Produces a display version:
	   <!ENTITY oe SDATA "\{o}">
 
I think this model doesn't take into account the full generality of
character to glyph mappings.  Assuming that SGML is specifying characters
(and not glyphs), I gather that, during display, these characters will
be mapped to glyphs.  But this should not be confused with the use of
entity names to represent characters.  How will this model handle the
1-N mappings of Arabic and Indic scripts, and the N-1 mappings to
ligature glyphs?  May I assume that this char->glyph mapping is outside
the scope of SGML?  [I've seen early drafts of DSSSL in which this process
is made explicit; however, I saw no evidence that those drafts take into
account the generality of the char <-> glyph relationship.  Perhaps that
has been rectified by now?]
 
   Therefore, we need a "charactser set manager" which can read any
   character data stream, compliant with ISO 2022 or IBM CDRA, or
   whatever, and let the parser see it as pure and undiluted ISO
   10646.  Passing to the application, we need to invoke the character
   set manager once again to convert the internal representation
   (ISO 10646) to whatever the application will understand.
 
I completely agree with this model.  Applications will continue to
use old character sets forever.  10646 is a good choice as a canonical
representation for a parser, with conversion to/from extant charsets
at the boundary.  I think many systems will use this model.
 
Glenn