I've started a wiki page at
is meant to explain the advantages and challenges. I would be more than
happy to elaborate or clarify on any point, so please feel free to ask away.
I'd also like to offer this to our informal working group as a basis of
discussion. Let me know if anyone else is interested to join the
discussion (or possibly online meetings, if the group prefers).
On 6/9/2011 3:34 AM, Martin Mueller wrote:
> This has been a very helpful and interesting discussion that I think I
> sort of understand. But I don't really understand it, and I am fairly sure
> that few of my friends and colleagues in English departments or academic
> libraries understand it either.
> Is there somebody out there who can explain layman's language what is at
> stake in using microdata to manage the TEI/HTML5 relationship and why or
> how this would help scholars as they work with textual data in digital
> Martin Mueller
> On 6/8/11 1:11 PM, "Brett Zamir"<[log in to unmask]> wrote:
>> Hello Stuart,
>> On 6/7/2011 4:36 AM, stuart yeates wrote:
>>> A couple of points:
>>> * Given that TEI is significantly more expressive than HTML5, any
>>> serialization would be lossly, and my understanding of the genetic
>>> editing proposal is that this makes TEI even more expressive.
>> Although it is more expressive semantically than the default HTML5
>> semantics, HTML5 Microdata allows for preservation of every single
>> semantic feature of TEI (literally allowing use of the TEI namespace on
>> the "itemtype" attribute and all of the TEI element and attributes
>> (e.g., by markup including anonymous divs or spans with the "itemprop"
>> attribute or in-body<meta/> and<link/> elements), though I think we
>> ought to narrow it down and determine a specific single algorithm).
>> So, technically, since it can subsume it, HTML5 can more expressive than
>> TEI (of course, if your TEI embeds XHTML, then it's an even, if still
>> meaningless, battle).
>> Besides deciding on an algorithm, I think the problem will really be in
>> converting from this semantically enriched HTML to TEI, since HTML has
>> things like forms, progress meters, and other markup which just doesn't
>> translate or make much sense in TEI (or at least the TEI I'm familiar
>> with). Of course, typically wikis won't allow creation of HTML form
>> controls or the like anyways, so it is probably not of big concern in
>> this environment.
>> Another challenge will be deciding whether any of the existing and
>> recognized Microdata schemas (thanks to Felix for the http://schema.org
>> site reference) ought to be used in part or whole where there is a
>> one-to-one correspondence between TEI semantics and those lesser
>> schemas. This could prove advantageous in exposing TEI documents to
>> specialized search engines.
>>> * If you're looking for repositories, I suggest you start with
>> That is helpful, thank you, but I would prefer to see these:
>> 1) ...available (where the license permits) on a site like WikiSource
>> which is organized by work, genre, etc., rather than only on a
>> specialized site set up for a single language of encoding.
>> 2) ...with the ability to have a seamless experience moving from the
>> category browser directly to the documents. No need for searching the
>> targeted page and sifting through a site's idiosyncratic notes and
>> structure. Just discoverable in a uniform way. It may be a small thing,
>> but it is ideal to my taste.
>>> * The TEI community and the digital humanities community more
>>> generally are pretty closely tied to the concept of the book, so I
>>> suggest targeting the flavour of HTML5 as used in ePubs.
>> Do you have an online reference elaborating on an ePub HTML5 flavor? As
>> far as I can see, ePub is not an HTML format.
>>> On 06/06/11 00:45, Brett Zamir wrote:
>>>> Hello all,
>>>> I am interested in seeing web apps develop to support TEI. I haven't
>>>> a chance to check out the XQuery tool someone mentioned here, though I
>>>> did a little work on my own (at http://brettz9.github.com/xqueryeditor/
>>>> ) utilizing the XQIB library in more of a proof of concept (though one
>>>> for which I have a perhaps vain hope of finding time to develop it into
>>>> something more).
>>>> Besides being written using web standards, I would hope such a tool
>>>> could obtain TEI texts in a central, open repository. While there may
>>>> some custom initiatives to bring TEI to the web, my thinking was that
>>>> might be more effective for open (and especially potentially
>>>> collaborative markup) projects to use an HTML5 serialization which
>>>> preserves all of the semantic data, especially as it would allow texts
>>>> to be shared without need for tools supporting stylesheets. While
>>>> microformats and RDFa seem to still be around, it looks more like
>>>> microdata is going to win out. Microdata offers a systematic way to
>>>> include the TEI namespace (albeit with instead of an xmlns attribute)
>>>> and besides the itemprop global attributes, the HTML5 spec also
>>>> specifically now allows<meta/> and<link/> in the document body which
>>>> think should be rich enough to define an official means of serializing
>>>> TEI into HTML (and if contenting oneself to a subset of HTML,
>>>> serializing back into TEI). But whatever could get consensus I think
>>>> could work.
>>>> I think such a serialization would offer such benefits as:
>>>> a) It would _not require customizable software to be previewed_; TEI
>>>> texts could be made available at public sites such as Wikisource or
>>>> sites based on the same Mediawiki software, assuming
>>>> https://bugzilla.wikimedia.org/show_bug.cgi?id=28776 would be
>>>> implemented (something more likely to be possible than expecting the
>>>> less familiar and non-historically-web-oriented language TEI to be
>>>> accepted). The texts could thus be previewed as structured HTML+CSS,
>>>> allowing for conveniently succinct wiki markup to be used to create
>>>> documents, while still allowing incremental improvements (and revision
>>>> control and history) to the semantic mark-up as well.
>>>> b) It would be encoded in the format already most _familiar to the web
>>>> community_, albeit enhanced, in a standardly outlined manner, by
>>>> TEI-based semantic markup. It does add the additional burden that
>>>> mark-up creators must learn both HTML and TEI (though applications
>>>> utilize TEI as the primary format, converting back to HTML when sending
>>>> text to the wikis, and merely storing HTML on the wiki)
>>>> c) _Search engines_ such as Google (see
>>>> http://www.google.com/support/webmasters/bin/answer.py?answer=99170 )
>>>> can discover such markup in a semantically-aware manner.
>>>> Does something like this interest anyone else?
>>>> I don't know whether it ought to be done as a modification of the
>>>> default TEI stylesheets, a simpler more predictable format (e.g., using
>>>> divs for pretty much everything rather than native HTML like
>>>> blockquote), a schema or what. Feedback on the idea is most welcome....
>>>> Best wishes,