Print

Print


If I understand you  correctly, a lot depends on what your files were before you annotated them. Were they TEI files to begin with? If not,   the transformation into TEI should really happen before. Then the process of annotation is just a matter of adding <w> and <pc> elements. Phil Burns' MorphAdorner (http://morphadorner.northwestern.edu) lets you annotate a TEI file, and its 'native' output format is  P5 file in which <w> and <pc> elements with the appropriate POS tags are added to the source file.

MM
Martin Mueller
Professor emeritus of English and Classics
Northwestern University

From: Gabor Toth <[log in to unmask]<mailto:[log in to unmask]>>
Reply-To: Gabor Toth <[log in to unmask]<mailto:[log in to unmask]>>
Date: Monday, October 13, 2014 1:33 AM
To: "[log in to unmask]<mailto:[log in to unmask]>" <[log in to unmask]<mailto:[log in to unmask]>>
Subject: Annotated Corpus and Python NLTK output

Dear All,

I am using Python NLTK to create annotated corpora. After POS tagging a text with NLTK I wish to create a corpus in TEI, i.e. I would follow the practice described in the Linguistic Annotation chapter of the Guidelines. I am wondering  if anyone has experience in the transformation of NLTK output into valid TEI XML files.

Best wishes,

Gabor