The scripting language Python has good support for XML (notably through lxml), and also has good "string formatting". I usually use a 'boilerplate' TEI-XML file, which is then filled up with text strings extracted from plain text files.
In your case, as Stuart and Ron have already remarked, you would indeed also have to use some regular expressions to process the text and convert the letter codes into TEI elements.
On 18-jun-2012, at 10:37, Roberto Rosselli Del Turco wrote:
> Dear all,
> a colleague of mine whom I'm helping with her project has quite a large corpus of texts encoded in DBT (http://www.ilc.cnr.it/viewpage.php/sez=ricerca/id=62/vers=ita Italian only, sorry), i.e. in a pure ASCII form where markup consists of simple letter codes, for instance:
> &C, &c for italics (&C is the opening tag, &c the closing one)
> 1 follows a vowel to mean grave accent
> $####$ to markup a page number (0001 and following)
> &V following text is in verse
> &P following text is in prose
> § marks a paragraph
> What would be the best method to export these texts in TEI XML? I ruled out XSLT since the input text is not a well formed document, perhaps some PERL script, or in a similar language? I wonder if there are already available solutions, of course I'm prepared to write an extensive "conversion table" ...
> Thank you in advance,
> Roberto Rosselli Del Turco roberto.rossellidelturco at unito.it
> Dipartimento di Scienze rosselli at ling.unipi.it
> del Linguaggio Then spoke the thunder DA
> Universita' di Torino Datta: what have we given? (TSE)
> Hige sceal the heardra, heorte the cenre,
> mod sceal the mare, the ure maegen litlath. (Maldon 312-3)