There are many people who have implemented real-world systems to convert
TEI docs into HTML on the fly. I'll share some fruits of my own
experience, for what they're worth.
First, a general comment. It is, to a limited degree, useful to think of
this as a DTD to DTD conversion - but only to a limited degree. I know
I'll ruffle some feathers when I say this (though perhaps not on _this_
list), but HTML is to my mind really just a style sheet. If you can
accept that, and if you are amongst those users of the TEI - I hope the
majority - who use it to document text _structure_, then you can draw two
1. There is no single TEI to HTML mapping. Some people will
want their paragraphs rendered as an empty line, and will map the TEI <p>
to an HTML <p>; others will want a carriage return followed by a tabbed
indent, and will resort to some kludge like <br><pre>\t</pre>.
2. It is generally a waste of time (read: processing overhead)
to employ SGML conversion tools / languages like Omnimark for on-the-fly
conversions. Most people I know use some sort of pattern-matching
technique. If you have to parse the entire TEI DTD every time you
deliver a document section, good night. This would also rule out using
sgmls.pm, as was mentioned.
I think that Virginia and Michigan both use Perl scripts, sauf
erreur. If you really want to optimize for speed, I would suggest Lex.
That's what I use, and in my initial trials, there was a noticeable
improvement in performance.
Here's how we ended up doing it. For each TEI text base, we
write a style sheet that maps "events" to either HTML tags or ASCII text
or both. An event can be an open or close tag, an attribute, a
specific attribute value, or a combination of these things. Since we
already had tools around for reading SGML files and doing
things to them, we wrote the style sheet in SGML. Next, we have a Perl
script which reads the style sheet, and spits out Lex code. Then you
just run Lex, compile the resultant C code, and presto whamo, an executable
This method works well only because all of our text bases are
fully "maximized," which is to say there are no implied closing tags. If
a closing tag is missing, the filter will never know an element was
closed, and total chaos ensues. My favorite disaster was the time all of
Act I of Antony and Cleopatra was transformed into a hypertext link to
nowhere. I think someone is writing an article about that.
Hope this helps.
Gregory Murphy, Text Systems Manager
CETH (The Center for Electronic Texts in the Humanities)
E-MAIL: [log in to unmask]
WEB PAGE: http://www.princeton.edu/~gjmurphy