Print

Print


SGML in Europe: a conference report
 
The Dutch SGML Users Group hosted a two day international
conference in Amsterdam 16-17 May under the general title `SGML
Update: consultancy, tools, courses'. This attracted over a
hundred delegates, by no means all from the Benelux area, though
mostly from European publishing and software houses. There were
two keynote speakers (Sperling Martin for the AAP, and myself for
the TEI), about a dozen presentations from manufacturers or
consultants and a well-arranged software exhibit in which all the
major SGML software vendors were represented, with the
conspicuous exception of Software Exoterica who had apparently
had to withdraw at the last minute. There was ample opportunity
for discussion and argument between presentations, over an
excellent buffet lunch and in the evenings.
 
Sperling Martin as one of the chief progenitors of the AAP
standard was happy to report that it was now in use by more than
25 major publishers, with a further forty planning to adopt it
over the next twelve months. He gave brief overviews of three
particularly successful applications on the fringes of
conventional publishing. Firstly, the Association for Computing
Machinery, which has just developed a five year strategic plan
with the AAP standard at the centre of several dozen new print
products, on demand reprint facilities, optically stored
databases, hypertext products etc. Perhaps more interestingly,
the ACM plans to mandate the AAP standard as the interchange
format of preference for its army of unpaid professional
contributors, reviewers and referees in the future. Secondly, the
Society of Automative Engineers, which is adapting the AAP
standard for use in something called a `Global Mobility
Technology Information Center' or in plainer English, a database
of information about all sorts of transport systems. The
interesting thing here was the convergence between SGML and
object-oriented databases -- as well as manuals of technical
information, SGML was being used as the vehicle for data to be
transferred directly into CAD/CAM systems. Sperling's third AAP
success story was a similarly hybrid development: a new legal
database system developed for the Clark Boardman Company,
providing integrated information services derived from legal
journals, statutes and regulations, a body of case law together
with interpretation and annotation, usable by traditional print
journals or electronic hypertexts. Of course, the AAP project had
not been an unmitigated success: it had begun at a time when SGML
was barely established, and some aspects, notably those concerned
with maths, formulae and tables have never been finished
properly. Moreover, there are a few deliberate errors in the
standard, introduced (said Sperling ingenuously) as `reader
tests'. He also called attention to some image problems -- all
too familiar to TEI ears -- such as the perceived conflict
between TeX and SGML, or ODA and SGML, and the intimidating
nature of SGML so long as its cause is left to the purists and
the evangelists. Looking to the future, Martin predicted an
increased awareness of SGML within the library community as a
practical means of coping with the explosive growth of published
materials, particularly in Science and Medicine. The AAP standard
was to be assessed for suitability as a `non-proprietary
information exchange vehicle' for electronically networked
journals, by the 110-member Association of Research Libraries,
under a scheme for which the National Science Foundation had
recently provided $0.75m seed funding. His presentation concluded
with some sound advice for those developing a strategic business
plan in which SGML featured (concentrate on the business asset,
don't expect technology to do everything, expect to spend at
least $5 a page to get electronically tractable text...) and some
predictions for future AAP work. A corrected version of the AAP
standard would be re-submitted to ANSI and a summary of needed
corrections to the published dtds would appear in EPSIG news at
the end of this year.
 
Seamus McCague gave an impressively detailed description of two
practical applications of SGML in work undertaken by his company,
ICPC, a fifteen year old Dublin-based specialist typesetting
company. One, for Elsevier, involved the production of about
100,000 pages of high quality camera-ready copy from SGML encoded
text annually; the other, for Delmar, the conversion of an
existing reference book into an electronic resource. Details of
the two projects provided interesting contrasts in production
methods; they also showed how the SGML solution was equally
applicable to two very different scale operations. For Elsevier,
the use of SGML greatly simplified both process and quality
control, by facilitating the automatic extraction of data for the
publisher's control database; for Delmar, it had made possible
significant improvements to the product (a drug handbook) by
automating the production of a variety of indexes.
 
Francois Chahuneau of AIS, the thinking man's Antoine de Caunes,
gave a characteristically ebullient presentation about the
relationship between SGML documents and database systems. He
distinguished four characteristic modes of action: simple storage
of documents in a database, where typically only a limited amount
of header type information is visible to the database; database-
driven document extraction, where documents are synthesized from
information held in a database as a specialised form of report;
tightly coupled systems in which highly volatile document and
database systems share information; and the true document
database in which all the information and structure of a document
are represented by isomorphic database constructs, thus combining
the well-understood strengths of database systems in such matters
as concurrency control, security and resilience with the
flexibility and multiple-indexing capabilities of document
processing systems. As examples of this last mode, he then
described in some detail two products: his own company's SGML-
Search, which is based on PAT, and Electronic Book Technologies'
Dynatext, and also demonstrated a  beta-test version of the MS-
Windows version of the latter. It uses an interesting scripting
language based in part on DSSSL, which enables it to be
configured to look more or less like anything, whereas SGML
Search is command-line driven, using a fairly rebarbative syntax.
 
The interface between SGML and database systems was also touched
on by Jan Grootenhuis of CIRCE, the doyen of Dutch SGML
consultancies. Speaking of his experience in teaching SGML, he
remarked that people with a typographic background found SGML
almost as difficult to understand as people with a computer
science background found the requirements of typography, which
struck a familiar chord. He then briefly described a recent
project in which documents had been converted automatically into
an Oracle database, using a database model defined by Han
Schouten. The project had shown that database definitions could
be automatically generated from a DTD; the complete suite of
Oracle manuals, created as Ventura or WordPerfect documents, had
been loaded into an Oracle-Freetext database, using SGML as an
intermediary. He noted that the tendency of technical writers to
use descriptive tagging to bring about formatting effects had
made this task unnecessarily difficult, and argued for better
enforcement of descriptive standards. He also outlined some
experiences in using SGML for CD-ROM publication of journals at
Samson, and of legal and other regulations published by the Dutch
government, and the updating problems involved. His conclusion
was that SGML was now past the point of no return. It was no
longer being used in pilot projects only, but as an integral part
of real work. Its use was no longer regarded as worthy of
comment; moreover, because its evangelists were too busy doing
real work to try to publicise it, the task was being taken on by
professional teachers and educators.
 
The first day of the conference concluded with manufacturers'
presentations. Tim Toussaint(MID) and Paul Grosso (Arbortext)
gave a joint presentation.  Toussaint revealed that MID, formerly
Dutch and now German, is now 26% French. They used Arbortext as
an SGML editor, and Exoterica's XTRAN to convert it for loading
into an unspecified relational database. Applications included
standard reference works such as the Brockhaus Duden and a
database of standards documentation. Grosso gave a good sales
pitch for Arbortext, which is a luxuriously appointed SGML editor
intended for use primarily in an electronic publishing
environment and described as non-intimidating and user-congenial.
It includes a specialised WYSYWG editor for tables and formulae
from which AAP-conformant marked up text is generated, has good
browsing and outlining facilities and its own script language.
 
Hugo Sleimer, European Sales Director for Verity (a spinoff from
Advanced Decision Systems) gave a classy presentation of a
product called TOPIC, the only relevance of which seemed to be
that it supported a wide variety of document formats, including
SGML. Much of his presentation dealt exhaustively with the
problems of text retrieval by boolean logic, at a level which did
not show much respect for his audience's intelligence.  Tibor
Tscheke, from Sturtz Electronic Publishing, was due to talk about
his company's work in creating an electronic version of the
Brockhaus Encyclopedia, but had unfortunately been forbidden to
do so by Brockhaus. He was therefore reduced to some generalities
about the role of information within an enterprise, the
integration of SGML systems into mainstream information
processing and so forth, which was a pity.
 
I opened the second day of the conference by summarising the
current status of the TEI and discussing some of the technical
problem areas we had so far identified, in particular those
raised by historians and linguists for whom any tagging is an
interpretation which must be defensible. This being the second
time I had done it in two weeks, I managed to get through most of
my material within a reasonable approximation to the time
allocated me.
 
Yuri Rubinsky (SoftQuad Inc) gave an entertaining and wide-
ranging talk, picking up in passing some of the technical issues
I had raised rather than simply presenting a product review,
though he did mention in passing (and also demonstrated) that
Author/Editor was now available under Windows and Motif as well
as for the MAC. The theme of his talk was that SGML could be used
to describe more than just documents, and that several of its
capabilities were under-used. There was more to an SGML document
than its element structure. Among specific examples he mentioned
were customised publication, for example by extracting `technical
data packages' geared to a specific maintenance task from CALS-
compliant documentation in the Navair database; using attribute
values to generate documentation at different user levels from a
common source; an ingenious use of entity references within
`boiler plate text fragments' in General Motors manuals; and the
assembly of customised DTDs from sets of DTD fragments by a use
of parameter entities strikingly similar to that proposed by the
TEI, or by use of marked sections. For the GM application, this
approach had reportedly saved the cost of its implementation
within six months.
 
Pamela Gennusa (Database Publishing Systems) also picked up the
recurrent theme of this conference: that SGML was uniquely
appropriate to  database publishing. She gave a good description
of the major issues in preparing text for publication in database
format and the strengths of SGML as a means of making explicit
the information content of texts in a neutral way, which was
essential given that authors and consumers had different
requirements of it and touching on the problems of security, high
volume and time sensitivity which characterise database
publishing as an industry. She also gave a good overview of the
capabilities of the new version of Datalogics' set of SGML
products, notably WriterStation, an impressive authoring tool
with several new facilities and  DMA (Document Management
Architecture) a complex set of object-oriented tools providing
database management facilities for SGML material which also
includes full text searching facilities like those described
earlier by Chahuneau.
 
Ruud Loth (IBM Netherlands) gave a workmanlike presentation of
IBM's SGML product range, which now includes an context sensitive
editor for OS/2 called TextWrite, a formatter for VM or MVS
called BookMaster and a new range of products called Book Manager
to deal with `softcopy books' (IBMese for `electronic texts').
Book manager Build runs under VM and MVS and generates `softcopy'
from GML or SGML documents; BookManager Read runs additionally
under DOS or OS/2  and has impressive facilities for hypertext-
style browsing, intelligent text retrieval, indexing and
annotation. IBM documentation (47,000 titles, 9 milliard pages)
would soon be available in this new form.
 
Bruce Wolman of Texcel AS then gave a detailed product
description of the Avalanche `FastTag' automatic tagging system
which, it is claimed, can handle almost any kind of text and
automatically insert usable markup into it. The product has two
components, a `visual recognition engine' which searches for
visually distinct entities in a document, as defined by a set of
rules encoded in a language confusingly called Inspec, and
another language, called Louise, which defines the form in which
these objects should be encoded. Things like tables, footnotes,
horizontal lines, running headers or footers or special control
sequences could all be automatically tagged as well as objects
defined by regular expressions or specific keywords in the text.
The product had just been launched in Europe and was available
for MSDOS, VMS, Ultrix and Macintosh.
 
John Mackenzie Owen of the Dutch consultancy Pandata gave a brief
description of the SGML handling capabilities of BasisPlus,
stressing however its strengths as a document management system
rather than its admittedly limited SGML features. Bev Nichols of
Shafstall described the Shafstall-6000, an all-singing all-
dancing document conversion system based on a package called
CopyMaster which included SGML among its 800,000 claimed
`document-to-document' pairings but which (I had the impression)
would really rather be operating on a proprietary format called
the Shaffstall Document Standard. The last presentation of the
day was from Ian Pirie of Yard Software Systems who described the
successful Protos project carried out by Sema Group and Pandata
for the CEC. The project handled proposals for funding from DG 13
which had to be distributed to member states for comment and the
ensuing comments. MarkIt had been used to validate the format of
the messages passed in either direction, its regular expression
facilities being particularly useful in automatically encoding
the content of telex messages, and its application language to
encode the messages for storage in a Basis database. The whole
operation had been carried out with minimal disruption of the
message system.
 
Aside from the presentations, the conference provided an
excellent opportunity to catch up on the expanding world of SGML-
aware software. Among products demonstrated were new versions of
MarkIt and WriteIt from Sema Group, of Author/Editor from
Softquad, Arbortext,  Writerstation from Datalogics and an
interesting new product, an SGML editor called EASE from a Dutch
company called E2S. Delegates were also given a copy of the first
fruits from the European Work group on SGML, a consortium of
European publishers which has been working on a set of AAP-
inspired dtds for scientific journals which took the form of a
very well designed and produced booklet documenting a DTD for
scientific article headers. I came away from the conference
reassured that SGML was alive and well and living somewhere in
Europe.
 
Lou Burnard
Text Encoding Initiative