*** PRIMARY CALL FOR PAPERS ****
ACL's SIGDAT and SIGNLL present the
THIRD WORKSHOP ON VERY LARGE CORPORA
WHEN: June 30, 1995 - immediately following ACL-95 (June 27-29)
WHERE: MIT, Cambridge, Massachusetts, USA
As in past years, the workshop will offer a general forum for new research in
corpus-based and statistical natural language processing. Areas of interest
include (but are not limited to): sense disambiguation, part-of-speech tagging,
robust parsing, term and name identification, alignment of parallel text,
machine translation, lexicography, spelling correction, morphological analysis
and anaphora resolution.
This year, the workshop will be organized around the theme of:
Supervised Training vs. Self-organizing Methods
Is annotation worth the effort? Historically, annotated corpora have
made a significant contribution. The tagged Brown Corpus, for example,
led to important improvements in part-of-speech tagging. But annotated
corpora are expensive. Very little annotated data is currently available,
especially for languages other than English. Self-organizing methods offer
the hope that annotated corpora might not be necessary. Do these methods
really work? Do we have to choose between annotated corpora and
unannotated corpora? Can we use both?
The workshop will encourage contributions of innovative research along this
spectrum. In particular, it will seek work in languages and applications
where appropriately tagged training corpora do not currently exist.
It will also explore what new kinds of corpus annotations (such as discourse
structure, co-reference and sense tagging) would be useful to the community,
and will encourage papers on their development and use in experimental
The theme will provide an organizing structure to the workshop, and
offer a focus for debate. However, we expect and will welcome a diverse
set of submissions in all areas of statistical and corpus-based NLP.
Ken Church - AT&T Bell Laboratories
David Yarowsky - University of Pennsylvania
SPONSORS: LEXIS-NEXIS, Division of Reed and Elsevier, Plc.
SIGDAT (ACL's special interest group for linguistic data
and corpus-based approaches to NLP)
SIGNLL (ACL's special interest group for natural language learning)
FORMAT FOR SUBMISSION: Authors should submit a full-length paper
(3500-8000 words), either electronically or in hard copy. Electronic
submissions should be mailed to "[log in to unmask]", and
must either be (a) plain ascii text, (b) a single postscript file, or
(c) a single latex file following the ACL-95 stylesheet (no separate
figures or .bib files). Hard copy submissions should be mailed to
Ken Church (address below), and should include four (4) copies of
REQUIREMENTS: Papers should describe original work. A paper accepted
for presentation cannot be presented or have been presented at any
other meeting. Papers submitted to other conferences will be considered,
as long as this fact is clearly indicated in the submission.
Submission Deadline: March 20, 1995
Notification Date: April 18, 1995
Camera ready copy due: May 11, 1995
Ken Church David Yarowsky
Room 2B-421 Dept. of Computer and Info. Science
AT&T Bell Laboratories University of Pennsylvania
600 Mountain Ave. 200 S. 33rd St.
Murray Hill, NJ 07974 USA Philadelphia, PA 19104-6389 USA
e-mail: [log in to unmask] email: [log in to unmask]