We have a philosophical difference at this point (in your reply to
Keld), one which I think it's important to bring up.
| Sure, shorter character names will be more efficient IF ONE HAS TO
| TYPE THEM IN AS NAMES.
Not necessarily only for this reason. A character and a glyph is not
the same thing. For a detailed exposition of the difference, see .
A user must be able to access glyphs, not just characters. The SGML
entity mechanism supports this. ISO 10646 does not support this
distinction. Indeed, it's the opinion of SC 18 (1) that it confuses it.
| While this may be necessary for a transition period until 10646-
| based editors are available, it definitely should not be seen as a
| long term solution.
We have to realize that the present plethora of character sets will
survive for decades, in documents, in text entry devices, in software
systems, in display devices, etc, etc, and making them inaccessible
and/or useless after some time T is counter-productive in a strong
sense. A long-term solution will need to address this problem. It
should be noted that the solution is _not_ to have mapping tables, and
code-point to code-point roundtrip conversion, as character coding is
*much* more complex than this simple models affords.
| It is completely absurd to require human users to enter characters
| as names of characters.
I don't think anybody has advocated that, either. However, SGML (and
Keld, too) realizes that we have to access a character somehow, and
whether it be produced by a sequence of keystrokes which result in one
character number from the text entry device, or many, is immaterial.
| Nobody in their right mind is going to do this for a Japanese or
| Chinese text, or a Thai or Arabic text for that matter.
Not if you write lots of text in that writing system, because then you
have text entry devices which support your natural habitat, so to speak.
However, if we allow for the existence of several habitats and diverse
populations of them, we find that some of them are more likely to aid
the survival of large character sets than others. The Japanese and
Chinese already have multiple-key-stroke-one-character entry schemes.
Arabic has many problems because of its incredibly out-dated focus on
handwritten characters which tie together and are weakly adapted to idea
of individual "characters" to begin with (i.e. the "smallest freely
combining units of [a writing] system"  is fuzzy).
| Instead of coming up with ineffective interim solutions that will
| never be used, ...
Objection! There are already more than a handful "interim" solutions
used by millions of people, who won't stop now, unless they can get
_more_ with the new technology than they can with the present. I don't
need to mention more than TeX, SGML and Word Perfect.
| ... we should be building fully-enabled 10646 systems. A task which
| I have been working at now for the last three years.
Excellent! However, the rest of the world has been working against you,
if you view your work as saving the world for the future (which is how
I'm inclined to view it). We need to reach out and capture someone's
interest before he will let go of his millions of characters' worth of
extant documents. Matter of fact, we have enough problems making people
realize that it's a good idea to use standardized 8-bit character sets,
and I won't even mention the problems we have trying to have people
identify the character sets they do choose. We can't demand that the
world will adopt ISO 10646 unless the changeover will be less painful
than continuing with what they're doing, no matter how much better they
will get it afterwards. After all, most users are cowards who shy away
from any short-term pain although we who know better know that they will
be happier afterwards. It's the "software dentist problem".
| For that matter, I would also argue that no human user should be
| forced to learn anything about SGML to use it.
This is the heart of the philosophical difference. SGML is much more a
philosophy of information representation than an actual language. The
idea that we represent structure and identify attributes (one of which
is the generic identifier) with (element) contents, is very much
different from the immediately gratifying visual presentation that users
have been brought up to think is the solution to their information
processing needs, not the means to _present_ the solution.
Therefore, it's important the the users think in terms of structure, of
elements and element types, of data attributes, of notations and special
formats, of the separation of information from presentation. This
doesn't have to mean that they will see SGML source documents, but, to
use the words of Yuri Rubinsky, "[the users] will be invited to abandon
their worst habits" . Bad habits don't go away by themselves. It's
important that Yuri stresses "invite". That's also what ISO 10646 does.
We can't force them to come to the party if they don't want to, however
much we would like to see them there.
| It should simply be an underlying serial representation like RTF.
RTF is an encoding of the past, when presentation and information were
inseperable. SGML is addressing the future, when "a gigabyte is a small
amount of information", and information _management_ will completely
overshadow other information technology disciplines. (Key question:
what can you do with an RTF document, apart from looking at it?)
| Applications should provide user interfaces that present the
| functional abstraction of SGML without requiring any specific
| knowledge of SGML syntax or representations.
I'm not sure this is possible, precisely because the abstractions are
_very_ hard to communicate to someone who isn't already accustomed to
syntax and representations. After all, SGML affords abstractions over
representations of information, into very high-level concepts. Many
people who use SMGL daily aren't aware of them, because they don't know
what it is SGML _can_ do, if put to it. It's _very_ unlikely that a
user interface should be able to communicate a functional abstraction
where the information necessary to make the abstraction is absent or
weakly refined in the user.
(1) I'm not speaking official on behalf of ISO/IEC JTC 1/SC 18, but this
is message in , and in ongoing discussions.
 Character-Glyph Model Discussion. Attachment 1 to ISO/IEC JTC 1/
SC 18 N3592 Rev. "Liaison statement to JTC 1/SC 2 from JTC 1/SC 18
on ISO/IEC DIS 10646-1.2" (1992-05-26)
 Yuri Rubinsky, in the Forward to Charles F. Goldfarb: The SGML
Handbook. Oxford University Press, 1991. ISBN 0-19-853737-9.
Erik Naggum | ISO 8879 SGML | +47 295 0313
| ISO 10744 HyTime |
<[log in to unmask]> | ISO 10646 UCS | Memento, terrigena.
<[log in to unmask]> | ISO 9899 C | Memento, vita brevis.