My pinch of salt on the tool discussion:
| By way of context, I am supervising a project this summer that employs
| four doctoral students in our English department to prepare an
| electronic edition of the Riverside Shakespeare based on the TEI drama
| dtd. The source text is the old WordCruncher text edited and lightly
| tagged by Virginia and Michigan. Houghton Mifflin, which has reclaimed
| the copyright to the WordCruncher text, will market this edition in
| some fashion yet to be determined.
My personal tool for *creating* SGML files is Emacs/psgml, to give that
one away immediately, especially since I have to leave my home platform
(Unix) now and then and use DOS/Windows, and because I generally prefer
that type of free software (see Signature :-)
As soon as the job is not *creating* an SGML document from scratch, the
situation changes completely, in my opinion; no matter what platform.
In the situation described above, where there are existing WordCruncher
files, the first step for me would be to create/use a conversion tool
that sucks as much information as possible out of these files and
represents it in SGML. For the Shakespeare example, this would include
act, scene and line divisions and numbering (that's all the markup in
WC), very probably also speaker labelling (and thereby speech
divisions), by patterns like "all caps on beginning of line, then
colon". Recognition of stage directions might be automizable as well
(e.g., "everything in parentheses").
The 'classical' tools for this are Perl or sed/awk, but a lot of others
work probably just as good; TUSTEP for those who know it, OmniMark for
those who pay it, Emacs functions for those who can program Lisp...
Once this is done (by some technically minded person), there is a richly
tagged SGML file, and the participants of the summer project can
concentrate their editorial wits on the interesting aspects of TEI
tagging instead of wasting them on tasks that computers (and their
programmers) can do for them.
For enhancing existing SGML files, I usually prefer a really dumb editor
(Emacs/fundamental), because their SGML-aware cousins don't appreciate
the intermediate breaking of (DTD-)rules that is almost inevitable in
Another very recommendable tool in the situation described above, when
SGML is generated in a first step, and then manually enhanced, is a
Revision Control System of the kind software developers use. The one I
used, RCS, copes very well with the uncomfortable situation when, after
major manual work, you discover that the automatic process still has a
problem and want to rerun it (normally, this would junk all manual work
done on the old output). What it does is carefully merge in the
changes; user intervention is only required when to changes clash (i.e.,
the new auto-conversion produces different output on a spot that was
also changed manually).
And once you have several programs working on your texts in a sort of
assembly line, "make" comes to mind as a useful helper for minimizing
computer time and scribbled notes with options...
... enough chat from the workshop.
(_) Tobias Rischer
"===' [log in to unmask]