You've clearly done a huge amount of homework before asking this question!
For me, the single biggest advantage of using TEI as a base format in a
scenario such as the one you outline is that it covers a huge range of
use-cases. You have a wide variety of document types, many born-digital
but some presumably not, and while some of the schemas you list (XHTML
for instance) are general-purpose schemas, some are highly specialized
(NLM is for journal articles; DITA has a very specific set of use-cases;
and DAISY and ePub are more publication formats than encoding schemes).
TEI, on the other hand, is ready to handle everything from born-digital
journal articles to medieval manuscripts, dictionaries/glossaries,
prosopographies, gazetteers, and a wealth of other material types and
forms. So the TEI may make a good base format in which to embed other
content in other namespaces where necessary; it may even provide
everything you need, enabling you to generate other outputs (XHTML,
ePub, talking books, DITA) from it.
I hope this helps,
On 15-08-04 06:59 AM, Amy Mack wrote:
> I am currently contemplating a new text-encoding project for private
> purposes. In case this is relevant to this discussion, I have a
> collection of digital texts in the fields of economics, investment,
> finance, markets, risk, etc, and in a range of different file
> formats. I intend encoding these texts to define document structure
> and content to assist with indexing, searching,
> cross-referencing/linking, data extraction, and any other uses I can
> think of in the future for general learning and content re-use.
> I have over the past couple of months familiarized myself with the
> TEI P5 guidelines as well as other XML schema including DAISY/DTBook,
> DITA, DocBook, EPUB, XHTML (aware of NLM but haven't yet looked at
> it), and XML generally, so I feel I now broadly understand the XML
> concepts of markup, schema modularity/customization, conformance,
> vaildation, format conversion/transformation and so on. I have come
> across the idea of single source publishing and found this
> presentation interesting
> amongst others that got me thinking about what may be the best
> base/master/root format for my purposes.
> For the sake of clarity (and I hope this does clarify as intended),
> by "base/master/root format" I mean if the project involves a
> combination of a number of existing schema (eg TEI, MathML, SVG,
> DTBook, DITA, XHTML, etc), and perhaps others yet to be defined,
> across a collection such that any single encoded document may include
> elements from any combination of schema, the base/master/root format
> is represented by the common root element of each encoded (single
> source) document. If all encoded documents have a TEI root element,
> TEI has been chosen/used as the base/master/root format.
> With that in mind, how can a new text-encoding project best go about
> determining which format to settle on as a base/master/root format
> for a text encoding project?
> If those who are now experienced with TEI were starting their first
> encoding project today, how would you go about assessing if TEI were
> the most appropriate schema to build around given the range of other
> alternatives now available? What process/methodology might you use?
> Perhaps put another way, what advantages exist to using TEI as the
> base format for encoding a collection of digital texts over other
> formats/schema (eg DITA, DocBoook, NLM, XHTML, etc.)?
> For general use cases (ie defining basic structure of a document),
> are there any clearly identifiable reasons to use TEI? Are there
> specific use cases that only TEI can address?
> Given W3C efforts in recent years to modularize XHTML, and given the
> similarities between XHTML and various other formats including TEI
> (both text-encoding standards in XML format for the representation of
> texts in digital form, consist of modules, and can be subsetted or
> extended), why would a new text-encoding project settle on using TEI
> instead of XHTML or some other schema?
> Although extensible and customizable (like many other schema), is TEI
> likely to be increasingly used for any specialized domains or
> purposes (eg the verse, linguistic analysis, manuscript description
> modules not found elsewhere)?
> And a final related question - what is the likelihood of the core
> modules/elements common to other schema likely to be standardized
> further with those found in other schema and/or is there any such
> roadmap or work already under way?
> While the guidelines and the website are great at explaining the TEI
> guidelines, I have been unable to find any discussion why a project
> would use TEI in lieu of alternatives. I came across a mailing post
> discussion that broadly raises this issue without any clear
> I'm sure I am not the only one to ponder these questions as these
> issues seem to me to be enormously important for anyone contemplating
> a non-trivial encoding project well before any commitment is made to
> any particular schema/workflow or combination.
> Thanks in advance for any robust discussion on any or all of the