Print

Print


Dear Amy and TEI list,
One of the challenges (and exciting aspects) of working with TEI is the transforming it to other formats via XSLT or XQuery. These languages pose a bit of a learning curve, working with a foundation in XPath--but exploring what they can do for you might be helpful in considering your question if what's best to use as a base/root for your project. When I first began exploring TEI, in working with manuscript materials and poetry, I quickly realized I needed more technologies to *present and share* the material I was coding (in lots of descriptive detail), but it was also clear to me that I was storing lots of information in TEI markup that wasn't about simply *representing* a text as it might look in a single print publication. I was recording things like what Syd Bauman describes, noting variant forms in other versions of this text, or attempting to document insertions and deletions, and a whole host of complexities (little narratives sometimes) regarding the location and dispersal of original documents--connected with what we call "metadata." TEI offers (and discusses and updates and reflects intelligently as a community about) specific guidelines for dealing systematically with textual complexities like these, and as I learned the TEI, I appreciated it as a system that would serve as a solid foundation for my projects--but was not optimal in its own right for web representation (though one can present XML in a web browser via CSS). Were I to attempt to document all the markup I apply in TEI on my own in XHTML, I doubt I'd be storing my information in as systematic a way--because I'd be working within a much more limited tagset and without the same kinds of semantic guidelines.

That issue is one to do with great semantic complexity not really being directly or easily conveyed in the more representational markup of XHTML (even though one can add semantic elements and adapt XHTML, it's not designed for holding the kinds of complex information about texts that we've been describing). But the issue isn't closed if you can apply XSLT or XQuery--because you can write these to render *many* possible transformations of your TEI into HTML, SVG, and other formats--all from a *single* TEI document. One richly encoded TEI document can be transformed to represent a text, or to extract pieces of its TEI code into, say, a chart or table, or some combination of both. For the TEI I've used to code a poem, for example, I can transform it with XSLT to present the full poem with HTML tags and write a little CSS to control its appearance in the various (and often-changing) web browsers, and I can write *another* transformation that pulls a chart of all the people's names I've marked in my text, and whether these are historical persons or fictional characters--all drawing from the same TEI I've marked. And I can pull out plain text in CSV or TSV formats, or pull information from my TEI elects and attributes into JSON arrays and export those files into some nifty data visualizations using network analysis or mapping software. And I can bundle TEI files together into an XML database to query hundreds of them together because I'm using a systematic coding structure to store the information--and generate output in HTML and SVG, or new XML, or text, etc. That adaptability, to me, is the reason I think of TEI as a foundation, on which I build many different kinds of web publications in other formats.

Longevity is a tricky question, and here it may help to consider how quickly web browser technology changes, even under the guidance of the W3C. TEI introduces revisions and refinements so it too changes, but always with an eye (and great consciousness) of a need for reliable consistency within its community. The idea is that your foundation informational markup, guided by a clearly defined project schema, shouldn't have to change when web browsers do. All that changes would be the scripts you use to extract, transform, and publish from that data in the formats you need. 

TEI experts have offered workshops to help get started learning XSLT and XQuery, the technologies for transformation, and they're worth investigating in this context--regarding what to consider as a base or root. They do take some serious time to learn, and for many people that can be a sticking point. It's worth looking around to see what opportunities are available, say at the DHSI or Oxford Summer School.

Hope this helps!
Elisa Beshero-Bondar
Associate Professor of English
University of Pittsburgh at Greensburg
Project Director: Digital Mitford and 



> On Aug 5, 2015, at 6:58 AM, Amy Mack <[log in to unmask]> wrote:
> 
>> On Tue, 4 Aug 2015 15:59:56 +0000, Sebastian Rahtz <[log in to unmask]> wrote:
>> 
>> I am not sure the formats are comparable
> 
> I am most interested in understanding which formats *are* comparable as feasible alternatives to be a base format. All other specialty formats were only mentioned to give an idea of what I have looked at to date.
> 
>> Mathml and svg are niche formats. Very useful if that is what you need
> 
> I came to the conclusion reading Chapter 14 of the guidelines that I will very likely need to make use of MathML (for some materials) and SVG (for most materials) for encoding formulae, charts and other figures. I will also have a need to encode complex tables and detailed linking.
> 
>> Nlm and dita anD docbook are for clear born digital materials
> ...
>> Tei is for ... all the shock of non-digitAl originals
> 
> If I were to treat all materials as though that were born digital (it is the content that is important to me, not the original format in which it was published), would DITA and/or DocBook be possible candidates to be a base format from what you know of these formats?
> 
>> where you want to identify precise structure and there isn't ambiguity
> ...
>> Tei is for when there is ambiguity
> 
> Can you elaborate on this concept of structural ambiguity? How does TEI in particular help handle/address this ambiguity?
> 
>> HTML is a compromise , which is perhaps best regarded as a rendering appearance
> 
> One of the issues causing me some confusion when comparing TEI with XHTML is that if XHTML is (for want of a better description) an XML version/extension of HTML and can be used to describe both document structure using div elements with the @class attribute, and document metadata, are there any specific advantages to using either schema over the other?
> 
> Am I correct in thinking that without modularity, XHTML would clearly be an inferior choice to TEI as a base format, but with modularity, flexibility to encode document structure and its relevance to EPUB and other web publishing that XHTML is a viable alternative?
> 
>> 
>> Of course you can crosswalk between all of the formats and represent , say, a modern article. But don't try to do a medieval ms in dita :-)
> 
> No medieval economics or financial markets materials in my collection. :)
> 
>> If in doubt stick with html, is my advice
> 
> Are you able to elaborate a little on your thoughts in this regard?