[ Robin Cover passed this exchange on to me. Though lengthy, it seems of
sufficiently general interest to merit re-posting. I assume it is
worth subscribing this list as a whole to the proposed comp.text.sgml
should it come into being. Any dissenters? - LB ]
Yuri Rubinsky, posted on News
From texbell!sq.sq.com!yuri Wed Aug 1 15:07:33 1990
From: [log in to unmask] (Yuri Rubinsky)
Subject: Re: software tools for SGML, proposed comp.text.sgml (LONG!)
Message-ID: <[log in to unmask]>
Date: 1 Aug 90 20:07:33 GMT
Organization: SoftQuad Inc., Toronto, Canada
In message <[log in to unmask]>
[log in to unmask] (Robin Cover) writes:
> If some SGML experts from among the "major players" are to be attracted to
> the group, the distinctive name "sgml" and focused attention on SGML is
> a clear desideratum. It will be hard enough to get support from SGML
> gurus anyway -- they will have neither time nor patience to muck through
> dozens of postings on unrelated topics....
> For a healthy SGML discussion, I feel it is imperative to have a couple
> SGML experts listening in. Those who have actually read the standard,
> or write DTD's, or build parsers will know what I mean. There is still
> a lot of confusion about what SGML actually *IS* (and is not), and it's
> easy for an unmoderated forum to generate unfortunate "mis-information."
> I would even suggest that several companies or SGML-supporting agencies
> be contacted (e.g., Software Exoterica; SoftQuad; Datalogics) to see if
> they would designate persons to help referee the discussion -- at least
> at moments when mis-information goes unchecked or when technical
> questions cannot be answered by the forum's regular readers.
On behalf of SoftQuad, yes. We will do our best (within our time
constraints) to respond to appropriate information and questions.
Here are some now:
In message <[log in to unmask]>
[log in to unmask] (Anita Eijs) writes/asks:
> 1) Are WYSISWYG-wordprocessors available which can read and write
> SGML ?
Yes and no. SGML encoding is generally considered to be at its purest
when it is free of formatting information. Its job is to interchange
structural data and content in such a way that any number of required
"formats" can be derived.
This makes possible work such as that mentioned
in message <[log in to unmask]> wherein
[log in to unmask] (Victor A. Riley) describes the work of:
> X3V1.8M MUSIC IN INFORMATION PROCESSING STANDARDS (MIPS) COMMITTEE
> operating under the rules and procedures of the
> American National Standards Institute
which is using the syntax of SGML to build a representation language
for hypermedia and time-based documents (music and multimedia events
are two examples). (I mention this because it has relevance later
with respect to the CGM question below.)
SGML is widely used for the storage of text in databases and is being
slowly but surely embraced by the CD-ROM community. [In his keynote address
at the 1988 CD-ROM Conference, Bill Gates announced that he thought it was
pretty clear that SGML was the storage format of choice for CD-ROM publishing.]
All in all, then: The creators of SGML-encoded files will not normally
know or be able to imagine all the uses to which their contents will one
day be put. "What You See is What You Get" is, accordingly, not a phrase
that has much meaning when longevity and multi-purposeness are the goal.
Nonetheless, WYSIWYG has a place in the SGML world. In article
<[log in to unmask]>
[log in to unmask] (Mark Sherman) writes:
> One can define imaging semantics to be associated with SGML. The program
> AuthorEditor from SoftQuad is quite nice in that regard. But its
> conventions are parochial -- an "SGML" system knows nothing about AE's
> semantics, unless the exchanging parties agree to information outside of
> the standard.
Mark is being a little bit mischievous here. Certainly my favourite
dictionary defines parochial as "confined to a narrow area", but the
"but" in his sentence doesn't recognize that very often this
local functionality is indeed a Good Thing.
For example: Just because the footnotes in my final document may be
printed in 6 or 8 point type is no reason why I should have to look
at them in that size on the screen. I'm perfectly comfortable knowing
that a pair of simple SGML tags will allow a text-for-paper formatter to
ensure that the footnotes will appear at the bottom of the page or chapter
end in a small point size, while a text-for-screen formatter may place
them in-line or at the bottom of a screenful of text, or in a thin column
to the left of the text body. Having computer screens imitate a piece
of paper (of all ancient technologies!) hardly does justice to their
Yes, in Author/Editor we DO associate screen formatting with SGML elements.
So too does IBM with its TextWrite product. Both of us do this for a very
good reason: Users take advantage of the screen formatting to build a
working environment in which they are comfortable and where the formatting
helps their tagging intuition. With a simple command in such an editor,
you insert "list item" or "table" tags (for example); screen feedback
assures you this is the element you wanted.
[A word of explanation for those who don't recognize these names: Both
SoftQuad Author/Editor and IBM's TextWrite are conforming SGML editors,
context-sensitive, structured and so forth, with good assistance for
the user encoding an SGML document, and "QUASI-WYG" in the way
described above. There are other SGML editors, Exoterica's Checkmark,
Sobemap's Write-It and Datalogics' WriterStation, which don't do this.]
So, What You See Accurately Represents What You Want, in the model
that suggests that writers are best left to writing, editors to editing,
and designers, later in the process (generally), to designing.
Here's a much shorter answer to the WYSIWYG question and,
simultaneously, perhaps to:
> 2) Are any translators available to convert SGML to troff, TeX,
> MSWord, etc., and vice versa ?
Microsoft has announced (in Government Computer News and the EPSIG
Newsletter, among other places) that it will announce a form of SGML
support by the end of 1990 for delivery in 1991. According to the
EPSIG Newsletter (the journal of the Association of American Publishers'
Electronic Publishing Special Interest Group operated by OCLC in
Columbus Ohio), Microsoft is currently evaluating SGML parsers.
WordPerfect Corporation released a Statement of Direction
in June 1989 saying "We are in the
process of developing a strategy to assist people in creating
WordPerfect documents that can be converted to and from SGML and ODA".
To the best of my knowledge, that company has made no other public
statements on this subject since.
Agfa Compugraphic CAPS, Xyvision, Frame, Intergraph, Interleaf, Context,
Datalogics, Arbortext, SoftQuad and perhaps others (apologies to anyone I've
forgotten in this list) have demonstrated the ability to take
SGML files encoded using specific tagsets (generally CALS 28001)
and show them on the screen matching line-for-line what will be
output to a printer.
Translation from SGML to formatter input is properly the task of an
SGML Parser, a utility which can understand enough about the context
of an SGML element [read "object" such as paragraph, or list item, or
table cell, or figure] to be able to produce an output stream which
is meaningful to a processor which may not understand "context sensitivity".
This is not (except when the SGML elements and their inter-relations are
particularly unsophisticated) a job for sed or awk, or even yacc or lex.
On the subject of parsers, Mark Sherman writes:
> I believe SoftQuad sells them. Quality, functionality and price unknown
> to me. There are probably more around, although I recall an article by
> Larry Welsch from NIST (ACM document processing conference) claiming
> that some parts of SGML were exceedingly difficult to implement, so you
> should watch out for how much is implemented when someone makes a claim.
A Conformance Testing Initiative led by the Graphic Communications
Association in North America and by the National Computing Centre in the
UK (with the cooperation of the European Community) will, within a
year or so, eliminate this issue. Today, the most popular parsers,
which are generally conceded to also be the most conformant, are those
of Software Exoterica (of Ottawa Canada), licensed by Frame,
Arbortext and Intergraph; and of Sobemap (of Brussels Belgium,
marketed by Yard Software of Chippenham Wiltshire UK), licensed
by Agfa Compugraphic CAPS, Interleaf, Context and Xyvision.
We have made available to our consulting clients
the parser from Author/Editor, which
is optimized to work with our SoftQuad Publishing Software
In Holland, Elsevier Scientific Publishers, as a matter of
course, I believe, use the SGML Parser of the Vrije Amsterdam University
to convert SGML files to TeX. A number of other sites in Europe perform
the same conversion as did the creators of the terrific
SGML/Structured Text Bibliography compiled by Robin Cover, Nicholas
Duncan and David Barnard [Queen's University at Kingston Ontario Canada,
Technical Report 90-281 still in draft form and available later this
Back to Anita's questions:
> 3) Is an SGML to PostScript converter available ?
Well, yes, though we think of that process not so much as a conversion
as traditional document processing. One could describe any software
product which makes up pages from SGML-to-parser input as performing
SGML to PostScript conversion. Neither SGML nor PostScript alone has
the smarts to know when to break a line or a page, and so on.
> 4) Does SGML support drawings (illustrations) ? How about tables,
> mathematical expressions ?
Yes, certainly, but these two questions have quite different answers.
a) Drawings/Illustrations: Think of SGML, at one level, as process
control. [Stop! SGML is not a procedural language, but nonetheless,
I believe this is the most straightforward way to explain the
functionality ...] The standard formalizes a set of declarations
which associate certain entities with "data content notations".
SGML's job is not to attempt to predict all the ways that any number
of hardware and software systems will store graphic images, video,
sound, smell, voice annotation, and so on. Rather, an SGML
document will contain, in easily recognized constructs, all the
information that a system needs to recognize where parseable text
starts and stops, and where control must be passed to an application
that can deal with the strictly delimited content which is non-SGML
data. [The hypermedia/multimedia work going on in the ANSI committee
mentioned above uses these capabilities very elegantly, even building
in SGML constructs to point to "the interiors" of non-SGML contents.]
Mark Sherman writes:
> Now, you and I can make a
> side agreement that whenever we use the tag "my-CGM-byte", the marked
> bytes will be in CGM-compliant format. However, that is an agreement
> outside of the standard and only usable by our local cabal. Ditto for
> tables, mathematical expressions.
This is not true. The standard defines a document as (more or less)
a Document Type Definition -- the set of elements, other constructs,
and their relationships -- followed by an "instance" of that DTD,
content marked up using the semantics rigidly prescribed by the DTD.
An ability to read the DTD is a vital function within any SGML system.
Accordingly, there is a completely standardized, interchangeable
method, within the standard, to pass along the data content notations,
such as CGM, or TIFF, or RIFF, or IGES, or IFF, or anything. It is
not the job of SGML (nor should it be) to dictate how applications
software will respond to the content being passed.
"Our local cabal" has nothing to do with the story. Anyone with
an SGML parser can read any SGML file and be passed a meaningful
b) As for tables and mathematics: Both areas are covered in
a "must-read" Technical Report (TR 9573) published by ISO/IEC
and edited by Anders Berglund (now of ISO, ex of CERN), entitled
"SGML Support Facilities: Techniques for Using SGML". The DTDs created by
the Association of American Publishers (which are now an ANSI
standard) and by the US Defense Department under the CALS initiative,
also contain "content models" for tables of varying complexity. It
is now up to software developers to find mechanisms for presenting
these content models to users in as straight-forward a way as is
possible, but there is nothing wrong with the underlying SGML data
representation. [Certainly the content models are complex. And so
they should be: tables can be extremely complicated.]
As far as math goes, for now, the CALS DTDs use the "data content
notation" construct described above, choosing to standardize
on TeX, EQN and IBM's Scientific and Mathematical Formula Format,
with tags to delimit nested math, and expecting the formatter to
handle the formatting.
> 5) Is it possible to use SGML and CGM in combination ? How about the
> availability of CGM-translators ?
See above. A variety of graphics and CAD packages exist which
claim CGM translation ability -- but to other graphics formats,
not to SGML.
The afore-mentioned "Techniques for Using SGML" extends an example given
in Annex E of SGML itself. The CGM clear text encoding in the example
is nested within the SGML document, but attributes associated
with the SGML elements dictate scaling and cropping.
> 6) Are parsers available to check an SGML-document on syntax ?
Yes. Software Exoterica's XGML, Sobemap's Mark-It, NIST's not-yet-complete
public domain utility, the Amsterdam Parser (which I've not seen, however),
and, to SGML sites using sqtroff, SoftQuad's. Datalogics bundles in its
own (built on top of the NIST parser, I believe) with its WriterStation
and Pager products; IBM includes one with TextWrite.
> 7) Are the software tools public domain ? What are the prices of the
> software tools ? What kind of software tools are available ?
There is an extraordinary variety of software tools available, from
all the vendors mentioned above, plus a few more:
Avalanche Development Company (Boulder Colorado) sells FastTag,
an "auto-tagger" which uses a proprietary visual recognition engine
to mark up documents from a variety of wordprocessors and scanner/OCRs.
PraXis Inc (Providence Rhode Island) will soon be showing its
Electronic Book Browser, a system which builds and displays hypertexts
compiled from SGML texts.
OWL (Office Workstations Limited of Edinburgh Scotland and Bellevue
Washington) uses SGML as an input source for its IDEX hypertext/
Other products (along with addresses and phone numbers
for all the companies mentioned throughout this article) are listed
in the SGML Source Guide, a publication of the
Graphic Communications Association
1730 North Lynn Street, Suite 604
Arlington, Virginia 22209-2085 USA
Telephone: 703 841-8160
Fax: 703 841-8144
attn: Marion Ellidge
GCA also publishes <TAG>, the SGML Newsletter,
which, along with the newsletters mentioned below, is a good
source of product descriptions and new product announcements.
GCA also hosts several SGML tutorials each year, as well as
the twice-annual TechDoc Conference [next one: August 20 to 24
in Washington DC] and, co-sponsored with the International
Users' Group, the annual Mark-up conference each May or June.
The EPSIG Newsletter, mentioned above, is available from
6565 Frantz Road
Dublin, Ohio 43017-0702 USA
Telephone: 800 848-5878
attn: Betsy Kaiser
The newsletter and bulletin of the International SGML Users' Group,
as well as a number of other publications, are available from
International SGML Users' Group,
c/o SoftQuad Inc
720 Spadina Avenue
Toronto Canada M5S 2T9 Canada
Telephone: 416 963-8337
attn: Steven Downie
A recent posting to this newsgroup described the work and intentions
of an SGML Consortium proposed by Ohio State University with
intentions of making available a variety of public domain SGML tools.
> 8) Will the newsgroup 'comp.text.sgml' be created ?
I suspect that if there was any doubt before, then the outrageous
length of this posting will tip the balance as crowds of comp.text
subscribers say "Get this stuff out of here!" Nonetheless, it seems
to me that there is another point of view on the subject:
Until SGML is taken for granted as a useful and normal part of
the working lives of all who toil with documents,
a national and international standard of this level of
capability might well be usefully discussed in comp.text rather
than in a separate newsgroup. I think that people generally interested
in text issues would do well to follow these discussions, rather
than create a distinct SGML ghetto. With the support of so many
governments, associations, research groups, hardware and software
vendors, as well as electronic and paper publishers of all sorts,
it's not going to go away. Anyone involved with comp.text may be
served by keeping on top of these developments.
Yuri Rubinsky (416) 963-8337
President (800) 387-2777 (from U.S. only)
SoftQuad Inc. uucp: 'uunet,utzooa!sq!yuri
720 Spadina Ave. Internet: [log in to unmask]
Toronto, Ontario, Canada M5S 2T9 Fax: (416) 963-9575
Mark Sherman (prefers ODA) response to Yuri
From texbell!andrew.cmu.edu!mss+ Thu Aug 2 10:27:10 1990
From: [log in to unmask] (Mark Sherman)
Subject: Long nits: Re: software tools for SGML, proposed comp.text.sgml (LONG!)
Message-ID: <[log in to unmask]>
Date: 2 Aug 90 15:27:10 GMT
References: <[log in to unmask]>
Organization: Information Technology Center, Carnegie Mellon, Pittsburgh, PA
In-Reply-To: <[log in to unmask]>
I normally try to refrain from flaming, but there is an enormous amount
of "SGML can do everything" rhetoric being expounded by SGML vendors,
that I feel the need to counterbalance a bit (for some truth in
advertising, some people claim that I flame because I have a vested
interest in ODA -- as I've said many times, I view them as incomparable.
No, I will not get into that discussion on a bboard again. I did not
push ODA in the previous message and will not mention it again here.
Call me on the phone.)
I should also preface my comments with saying the Yuri's comments were
very reasonable. But I do have some nits.
Excerpts from netnews.comp.text: 1-Aug-90 Re: software tools for SGML..
Yuri [log in to unmask] (17297)
> Nonetheless, WYSIWYG has a place in the SGML world. In article
> <[log in to unmask]>
> [log in to unmask] (Mark Sherman) writes:
> > One can define imaging semantics to be associated with SGML. The program
> > AuthorEditor from SoftQuad is quite nice in that regard. But its
> > conventions are parochial -- an "SGML" system knows nothing about AE's
> > semantics, unless the exchanging parties agree to information outside of
> > the standard.
> Mark is being a little bit mischievous here. Certainly my favourite
> dictionary defines parochial as "confined to a narrow area", but the
> "but" in his sentence doesn't recognize that very often this
> local functionality is indeed a Good Thing.
I do not want to get into the religious argument as to whether a
document ought to include its appearance along with its representation,
i.e, whether "this local functionality is indeed a Good Thing". My point
is only that the semantics are confined to only Author/Editor. If you
send that SGML document to an Interleaf product that supports SGML, it
won't look the same. Historically, our project (Andrew) has taken the
view that life is on the screen and not on paper. Therefore, we followed
the path suggested of making life good on screen at the expense of
paper. For example...
> Just because the footnotes in my final document may be
> printed in 6 or 8 point type is no reason why I should have to look
> at them in that size on the screen.
Hell, if I got a footnote, I just pop it up or collapse it on the screen
as necessary. My citations snap to the window that they cite. Who cares
about 6 or 8 point font? At least that's how we built our software 4
years ago. We too thought:
> I'm perfectly comfortable knowing that a pair of simple SGML tags will
> allow a text-for-paper formatter to ensure that the footnotes will
> appear at the bottom of the page or chapter end in a small point size,
> while a text-for-screen formatter may place them in-line or at the
> bottom of a screenful of text, or in a thin column to the left of the
> text body. Having computer screens imitate a piece of paper (of all
> ancient technologies!) hardly does justice to their capabilities.
For us, read that first sentence as "if you want paper, we'll generate
troff". Well, we now have many megabytes of user complaints that they
use the computers to generate paper and what they see ain't what they
got. Their professional life does match our whimsies. They use more than
one computer, more than one program and more than one medium (i.e. paper
in addition to files). They want it to be the same everywhere. Please
don't shoot the messenger, I have enough arrows in my back. Note that a
key element of CALS (one of the big SGML motivaters in the US) is an
interpretation of display semantics for tags. CALS compliance is not
merely SGML compliance at the DTD level, but also visual compliance.
> With a simple command in such an editor, you insert "list item" or
> "table" tags (for example); screen feedback assures you this is the
> element you wanted. ... A word of explanation for those who don't
> recognize these names: Both SoftQuad Author/Editor and IBM's TextWrite
> are conforming SGML editors, context-sensitive, structured and so forth,
> with good assistance for
> the user encoding an SGML document, and "QUASI-WYG" in the way described
> above. There are other SGML editors, Exoterica's Checkmark, Sobemap's
> Write-It and Datalogics' WriterStation, which don't do this.
Sure, as long as all the editors you use will interpret "table" tag as a
table. I'll bet the Exoterica's Checkmark only knows that identifier as
a tag. If the file goes to a system that does not interpret "table" as a
table, then you'll just get streams of characters matching whatever
table content encoding Author/Editor or TextWrite use. Which is my
point: you can add a great deal to SGML (some of which has been written
down in the references in your message), but all of those additions are
outside of the SGML standard. I don't care to argue whether that is good
> Agfa Compugraphic CAPS, Xyvision, Frame, Intergraph, Interleaf, Context,
> Datalogics, Arbortext, SoftQuad and perhaps others (apologies to anyone
> I've forgotten in this list) have demonstrated the ability to take SGML
> files encoded using specific tagsets (generally CALS 28001) and ...
A warning to the casual user: just because a vendor says they are CALS
compliant, does not mean they have a general SGML system. For example,
one can write an editor that has the CALS DTD and imaging semantics
built in. It will work just fine with other CALS systems, but be close
to useless with any general purpose SGML system or other tag
interpretation. In fact, a salesman from one of the above mentioned
vendors (not SoftQuad) told me that their product worked exactly that
way, "so it was a fool proof CALS system -- the user need never worry
that they were generating some SGML that would not be CALS compliant."
> show them on the screen matching line-for-line what will be output to a
Right. Because CALS is not only SGML, but a collection of *other*
standards (e.g., MIL 28001) that define what those tags mean and how to
interpret the content so tagged. Not true for generic SGML.
> The standard defines a document as (more or less)
> a Document Type Definition -- the set of elements, other constructs,
> and their relationships -- followed by an "instance" of that DTD,
> content marked up using the semantics rigidly prescribed by the DTD.
> An ability to read the DTD is a vital function within any SGML system.
> Accordingly, there is a completely standardized, interchangeable
> method, within the standard, to pass along the data content notations,
> such as CGM, or TIFF, or RIFF, or IGES, or IFF, or anything. It is
> not the job of SGML (nor should it be) to dictate how applications
> software will respond to the content being passed.
> "Our local cabal" has nothing to do with the story. Anyone with
> an SGML parser can read any SGML file and be passed a meaningful
> output stream.
We are violently agreeing. I am speaking from the perspective of a user,
not an implementor. When someone asks a question like "I have a document
with a CGM drawing in it and want to send it to a PostScript printer. I
heard that SGML is an interchange medium that supports CGM and
PostScript. Can I use SGML for the conversion?", you *know* they are
asking whether they can print their file, not whether you can write a
file with little tags saying "here are postscript bytes, here are CGM
bytes". They want the drawing converted. For the editor, printer or
other imaging program to work, they have to know (1) that your tags mean
that the data are represented as CGM , PostScript, CALS tables, AAP
equations, or whatever ("anything") and (2) how to process those bytes
that are so tagged. As you say, SGML does not specify anything about how
to process the content. There are lots of ways to say how to process the
content: all outside of the SGML standard. With just SGML and a generic
SGML parser, I can parse the bytes and print a wonderful message: "I
just found some CGM bytes" and then do nothing more with them. Actually,
I can also say that it was legal for those bytes to appear at that
location in the document, and possibly where else I could put those
bytes. Sorry, that is not what most users expect.
> b) "SGML Support Facilities: Techniques for Using SGML". The DTDs ...
> contain "content models" for tables of varying complexity.
For the layman: this means that if you want to exchange a table or
equation from your system to another system, your translator must
convert from your representation (say SYLK) into the DTD's format in
SGML. By the way, make sure that the receiving SGML system understands
the same DTD, or you will still lose your table at the other end. (Yeah,
yeah, I know: the table is still there, as a marked up SGML and the DTD
syntax rules can be passed along, but that ain't enough for an editor to
understand the data as a table -- it can only be understood by the
recipient as a collection of structured content.) There are lots of
techniques for using SGML, but saying "SGML" by itself is not enough to
answer most user's questions.