Print

Print


On 2011-08-22 14:35, John A. Walsh wrote:
> A relentless charge towards interoperability would benefit
> the increasing tendency towards the Googleization of digital
> humanities, in which bigger is generally better, "good enough" OCR is
> good enough, and lip service is paid to beautifully crafted and
> carefully curated smaller projects as they are relegated to a
> necessary but increasingly irrelevant middle ages of digital
> humanities.

It's rather a shame that we can't imagine beyond this either/or view of 
the discipline where scale equals low quality and the value of 
large-scale text processing tasks is seen as an assault on the value of 
small-scale critical work. As I argued in a previous post on this 
thread, TEI does a very good job of enabling any number of 
heterogeneous, deeply encoded and carefully curated small projects to 
produce output which is eminently compatible with the large-scale, 
shallowly encoded output of bigger projects.

 > Rather than have at my disposal millions of homogenous and
> interoperable TEI texts provided by Google or whomever, I would prefer
> to make my way through a smaller number of meticulously encoded texts,
> where the mind of the scholar(s)/editors(s) is present in the
> encoding, along with ingenious and clever encoding strategies that
> suggest important critical insights about the texts. This is the
> nanotechnolgoy of digital humanities.

I personally also have more use for meticulously encoded TEI, and I'm 
much more interested in encoding meticulously than I am in automating 
vast amounts of data. However, in a world where nobody's trying to take 
your carefully encoded texts away from you, would the existence of an 
initiative to enable Google Books to export millions upon millions of 
out-of-copyright texts in a basic, structural TEI format not be better 
than no such initiative? As John knows (and anyone who doesn't can learn 
by browsing the open list archives) the TEI Council is even now working 
with a Google engineer to improve code that could in principle achieve 
precisely this. Do you really think this work does harm? Would it not be 
great to have millions of TEI texts online, in the public eye, raising 
our profile hugely, available for large-scale processing, NLP, entity 
recognition etc., if desired, or for taking as a starting point for 
anyone who wants to enrich a 19th century novel with deeper structural 
or semantic markup, or... any of a number of other possibilities?

I really don't see how any of this takes away from what I am most 
interested in, which is the highly detailed markup possible in TEI.

> Having said that, I don't think the TEI as it currently exists is
> necessarily incompatible with greater interoperability, and certainly
> the TEI community has the expertise to provide an interoperable format
> in addition to an interchange format, but it would be something
> different and should supplement rather than replace what we currently
> have.

I think the interoperable core of 99% of TEI documents is not in any way 
incompatible with the development of new and ever more sophisticated 
features for improvement of TEI's expressiveness. Maybe we do need more 
explicit guidance (as Julia suggests in her email that I've just spotted 
after writing 3/4 of this one) on what that interoperable core should 
look like, but maybe that's also something which can be intuited from 
the core of what most people use (as Lou described the origins of TEI 
Lite earlier). Either way I'd have a lot of hope for and excitement 
about the interaction of these two views of TEI, which I don't by any 
means see as in conflict with one another.

G


-- 
Dr Gabriel BODARD
(Research Associate in Digital Epigraphy)

Department of Digital Humanities
King's College London
26-29 Drury Lane
London WC2B 5RL

Email: [log in to unmask]
Tel: +44 (0)20 7848 1388
Fax: +44 (0)20 7848 2980

http://www.digitalclassicist.org/
http://www.currentepigraphy.org/