The TEI workgroup on SGML=>XML migration will hold its first meeting in
Chicago in two weeks. The group will discuss some of the challenges
associated with converting legacy TEI data to the P4 XML format, and will
attempt to develop generic conversion tools, methods, and best practices.
To that end, we are interested in learning about current practice in
TEI encoding, especially with respect to SGML-specific features, and about
attitudes toward/experience with migration to XML. If you work with TEI
SGML data, please consider taking some time to respond to the following
questions:
- Do you expect benefits from a conversion of your material to XML?
Which, and why or why not?
- Have you attempted or completed conversion of TEI material from SGML to
XML? With what procedures and software? How did you fare?
- Does your material make use of SGML features that are not
supported in XML, and cannot be trivially converted? For example:
- ambiguous missing end tags
- case insensitivity
- the CONCUR feature
- SUBDOCs (e.g., FSDs or WSDs)
- comments in places where they are not allowed in XML
- the rules for ignoring record-ends by the SGML parser
- SDATA entities that don't map to Unicode characters
- DTD extensions using techniques not supported in XML
+ ampersand connectors
+ inclusion exceptions
+ exclusion exceptions
+ declaration of multiple GIs at once
+ #CURRENT attributes
- Would you accept converted XML that is technically (ESIS-) equivalent
to but cosmetically different from the original SGML, or are there
specific organizational and formatting features that you need to retain
(indentation, tag renaming, etc.)?
- Are you able to do some conversion work manually, or is your
body of material prohibitively large? How much effort would you willing
to invest per document? Must automatic conversion methods be
"perfect" for you to use them?
Your responses will help guide the efforts of the workgroup.
Thank you,
Chris Ruotolo
University of Virginia Library
|