Print

Print


Dear Ineke (cc TEI-L),
 
> In a new project we would like to use Author/Editor to create SGML
> version of a 1.000.000 words corpus (Dutch newspapers, ASCII input,
> SunOS 4).
>
> How many manmonths should we allot to it? We don't have any experience
> with Author/Editor at all, so we are completely in the dark with
> respect to the time needed.
 
This question cannot be answered in any general way; it depends on the
structure of your input files and the density of tagging you wish to
employ.
 
A general strategy is to autotag as much as possible by using file- and
format-structure information already present in your input file (is each
constituent document a separate file? are paragraphs separated by blank
lines? etc.), which you will do with custom scripts, rather than
Author/Editor. Author/Editor can then be used to tag manually that which
cannot be tagged automatically. With a 1G corpus, autotagging is
extremely fast and hand tagging extremely slow, and your man/month output
will reflect your particular tagging needs and goals.
 
I have used two SGML editing packages fairly extensively: Author/Editor
and the PSGML mode for emacs. I find PSGML better for my own work
(faster and easier to use), but Author/Editor does a better job of
protecting the user from making mistakes, and it's been my first choice
for users who are not comfortable with emacs or reasonably knowledgeable
about SGML.
 
Cheers,
 
David
________________________________________________________________________
 
Professor David J. Birnbaum     email: [log in to unmask]
Department of Slavic Languages  url:   http://clover.slavic.pitt.edu/~djb/
1417 Cathedral of Learning      voice: 1-412-624-5712
University of Pittsburgh        fax:   1-412-624-9714
Pittsburgh, PA 15260 USA