Print

Print


----------------------------Original message----------------------------
David, and all the rest of you, too:
 
I can sympathize with your desire to not engage in the standards process
and instead get on with the real work, but I have no sympathy if you
don't carefully read messages about such processes before you debunk
what isn't even under consideration.  (No, nobody has tried to push very
long entity names in any way.)
 
Please _read_ the following, before you decide to cast a vote of
disapproval on a proposal you have no interest in before it's a fact
that will annoy you in practical applications.  Trust me, I wouldn't
work on this if I weren't really, really, deeply annoyed by the fact
that the present public entity sets are almost completely useless, and
require hours of manual labor to change into something useful for a
given display device.  _You_ don't do this work, so _you_ may not care,
but you have to pay (and trust) the moron who is set to do such menial
labor, because SGML gurus sure try to spend their time on other things.
 
OK, bear with me, and follow me beneath the cover.  This will be a
fairly short trip, and it won't hurt.  Take a "schwa", or some other
favorite IPA character, with you, and come along.
 
In a document you type, you would rather have a "schwa" if you could,
and if you can, more power to your editing tool.  I assume, however,
that you can't, or you wouldn't be here.  We already know that if we can
associate a name with this symbol, we can reference the name and get the
symbol.  So far so good.  SGML even provides the syntax for these entity
references, so we don't have that wheel to invent, too.
 
The problem you still face is the one of finding out what your "schwa"
is called this week, and whether the application of the month can handle
it in the new version.  If you're always using documents with the same
application on the same old system, you can be satisfied with a local
solution (a.k.a. "hack"), and let others have their problems.  However,
there are standards bozos like me who care about the general case.  I
care about two things: (1) how you can find the name of your "schwa" in
a definitional entity set (which merely lists the entity names and what
they "mean"), (2) how the parser can select the right display version
(which lists entity names and the associated magic to have the desired
character end up in the resulting document).
 
I propose that we use the facilities of SGML to differentiate between
_definitional_ and _display_ character entity sets to use the full name
of the _character_ the entity is intended to capture.  Note: SGML
already has this differentiation built-in, it's just that most parsers
don't use it, because it hasn't added value to do so.  (Until now.)
Instead, SGML gurus waste their time on moving public entity sets
between machines and applications, and they hate it.  If you don't see
many public entity sets that support many different display devices,
this will give you a clue why.
 
With ISO 10646 (the huge character set standard), we got a list of names
of characters for free (or as free as anything published ISO).  This was
just a means to an end to the people who wrote _that_ standard, but we
can use the same means to another end.  For instance, our schwa is
called "LATIN SMALL LETTER SCHWA".  (You can look this up in ISO 10646,
and you can even gaze at the nice little characteristic glyph and nod in
recognition.)
 
Given a unique name for every conceivable character, and then some, we
can make a mapping from _entity_ to _character_:
 
	<!ENTITY schwa		SDATA "LATIN SMALL LETTER SCHWA">
 
which tells us that &schwa; will yield a schwa, properly parsed.  Note
that I picked this name at random.  You may not think it's particularly
random, and I don't think so either, but I could have chosen "frotz",
and if your SGML system insisted on fixed entity names, you would have
to put up with that, or complain.  Now, "frotz" is much better than
"latin-small-letter-schwa-this-week", and I promise you that you'll
never see the latter, but it still is a pain.
 
That is, you face a choice, as a user, of an entity name, thus:
 
(1) you can use an entity name that has been standardized, in the hopes
    that everybody will use the same entity name for the same thing,
    which is actually a character, or
 
(2) you can use any entity name which refers to the character you really
    want if you properly declare it.
 
The present trend is the former, and we already know that this won't and
can't work.  The reasons are very simple: People already differ in their
tastes, and there are only so many character that can be named with
short entity names.  Also, there are at least two widely disseminated
public entity sets which differ _radically_ in entity names for the same
characters.
 
I propose the second solution, which will enable us to map _any_
definitional entity set onto a display entity set for a given
environment (i.e., display device) by _name_ (and that's the long
_character_ name, as opposed to whatever entity name you choose,
probably very short if I get your message, something like "e", right?).
 
That is, given
 
	<!ENTITY schwa		SDATA "LATIN SMALL LETTER SCHWA">
 
and a display device which will render the schwa if we give it the
decimal code 159, we can have a mapping (which "happens" to look very
much like a character declaration):
 
	159 -- 009F --	1 "LATIN SMALL LETTER SCHWA"
 
to produce the resulting display entity declaration:
 
	<!ENTITY schwa	SDATA "&#159;">
 
Note, again, that this is completely irrespective of the _entity_ name,
which can be everything from one character to your local maximum number
of characters long.
 
What will this buy you?  If you have an in-house SGML guru, it will buy
you a friendlier in-house SGML guru, because he has to do that anal-
retentive work on those stupid tables only once, and he doesn't have to
do a lot of boring work if you get a document from somebody with a
different choice for entity names.  If you don't have an in-house SGML
guru, you'll get better telephone support.  Somewhere down the line,
you'll even get SGML support for a wider variety of output devices, and
perhaps more useful and easier obtainable public entity sets which will
actually work with your application, _without_ calling said SGML guru
(at night).
 
As the user, you'll also be able to select entities by looking at a list
of full names of characters in a published, reliable standard (unlike
some vendor's (missing) documentation), and you can be ascertained that
whatever public entity set you use, it will come out just like you want,
even after you move it to a different vendor's system.  (At least this
is what I wake up remembering that I dreamt.)  You can even invent local
characters that isn't in _any_ public entity set, if your parser is
smart enough.  (Mine is, of course.)
 
Now, why do I care about this?  I develop solution that I try to propose
for standards, I develop software to use those solutions in practice, to
field test them, and to gain that all-important feedback, and I use the
results of this stuff to talk to 3 widely different printers in my own
SGML system.  Guess who got sick and tired of fiddling with unreadable
and unmaintainable code tables and other assorted randomness?  That's
me, and I set out to do something about it.
 
Of course, _I_ don't want to type in "LATIN SMALL LETTER SCHWA" every
time I want a schwa, either.  I may be crazy for all this work I do on
SGML, but I'm not a complete idiot.
 
Matter of fact, I'd like to tell my SGML editing system that "Hey, you!
I'm going to use a schwa over and over in this document, and I'd like to
have it accessible with a minimum of fuss."  So, I proceded to solve
this problem, too.  If you have one of those workstations that can give
you whatever random graphic character you want, and you can map this
character to some code you can input from your keyboard, why shouldn't
you be able to be _saved_ all this "&schwa;" business to begin with?
 
The solution is called "dynamically redefinable character sets", and it
doesn't take very much to have one running on _your_ workstation, even
on your Windows system.  Under this scheme, you would tell your parser
that you'd used code 159 for schwa by mapping 159 to LATIN SMALL LETTER
SCHWA as above.  (Only, I expect you won't have to do this yourself.)
Then your parser and application will be happy, because it knows what to
do with a character named LATIN SMALL LETTER SCHWA, even if it wouldn't
have a single clue about "code 159" if it bit it in the rear.
 
See, as a user, myself, "I'm a little irritated" (that's a trademarked,
prize-winning understatement) for having to use entity names for common
Norwegian letters.  (They occur about as frequently as "q" in English.)
Unlike other users, I'm not docile enough to accept "it's supposed to be
that way".  I rebel, I throw out the rascals, and I solve the problem.
 
If other users don't see the problem, and are willing to wait until they
repeatedly meet it, and end up throwing thousands of dollars' worth of
computing machinery down nine floors, be my guest.  If I solve it before
you get thus aggravated, send me the net worth of the computer you
_didn't_ destroy in frustration and anger, OK?  God knows I'm not
getting paid for this work in _this_ life.
 
I'm trying to solve a problem you have, and if you don't want to listen,
that's your prerogative.  If you recognize the problem, and have any
comments, I'd be very happy to hear about them.  If you don't recognize
the problem, and think you're happy where you are, don't try to obstruct
the solution to problems that every foreign language SGML users has to
fight every single day, unless they have _very_ friendly SGML guru(s) in
their immediate vicinity.
 
Best regards,
</Erik>
--
Erik Naggum             :  ISO  8879 SGML     :      +47 295 0313
                        :  ISO 10744 HyTime   :
<[log in to unmask]>        :  ISO 10646 UCS      :      Memento, terrigena.
<[log in to unmask]>       :  ISO  9899 C        :      Memento, vita brevis.