On 5/7/06, Sebastian Rahtz <[log in to unmask]> wrote:
At the level of encoding you need, I'd have thought an automated upgrade
from HTML to TEI is not beyond the bounds of possibility? If people can
markup up <p> and section headings and italics in HTML, then making
clean (if not sophisticated) TEI XML is just a matter of programming...
For the general fiction books, that isn't too much of a problem (we've got a barebones conversion tool available). It is the more "involved" texts that cause confusion. For instance, just adding footnotes to a text makes it confusing for the newbie. Another confusing point is properly nesting <div>'s when the text has multiple section layers.
The other problem is conceptual. HTML incorporates largely presental tags all over the place. People love to use such tags to "match" the page layout of the original book. TEI is largely semantical tags where you are more concerned with what something IS rather than what it LOOKS like. This subtle but important difference trips up new markup folks than come from an HTML background.