Hi Gabby,

I didn't at all mean to suggest the current work with Google to
produce TEI output from Google Books does any harm. On the contrary, I
think it's  important, beneficial, and exciting work.

I see that work as a completely different issue than what I was
referring to, perhaps too cheekily, as a potential Googleization of
digital humanities.


On Wed, Aug 24, 2011 at 11:05 AM, Gabriel Bodard
<[log in to unmask]> wrote:
> On 2011-08-22 14:35, John A. Walsh wrote:
>> A relentless charge towards interoperability would benefit
>> the increasing tendency towards the Googleization of digital
>> humanities, in which bigger is generally better, "good enough" OCR is
>> good enough, and lip service is paid to beautifully crafted and
>> carefully curated smaller projects as they are relegated to a
>> necessary but increasingly irrelevant middle ages of digital
>> humanities.
> It's rather a shame that we can't imagine beyond this either/or view of the
> discipline where scale equals low quality and the value of large-scale text
> processing tasks is seen as an assault on the value of small-scale critical
> work. As I argued in a previous post on this thread, TEI does a very good
> job of enabling any number of heterogeneous, deeply encoded and carefully
> curated small projects to produce output which is eminently compatible with
> the large-scale, shallowly encoded output of bigger projects.
>> Rather than have at my disposal millions of homogenous and
>> interoperable TEI texts provided by Google or whomever, I would prefer
>> to make my way through a smaller number of meticulously encoded texts,
>> where the mind of the scholar(s)/editors(s) is present in the
>> encoding, along with ingenious and clever encoding strategies that
>> suggest important critical insights about the texts. This is the
>> nanotechnolgoy of digital humanities.
> I personally also have more use for meticulously encoded TEI, and I'm much
> more interested in encoding meticulously than I am in automating vast
> amounts of data. However, in a world where nobody's trying to take your
> carefully encoded texts away from you, would the existence of an initiative
> to enable Google Books to export millions upon millions of out-of-copyright
> texts in a basic, structural TEI format not be better than no such
> initiative? As John knows (and anyone who doesn't can learn by browsing the
> open list archives) the TEI Council is even now working with a Google
> engineer to improve code that could in principle achieve precisely this. Do
> you really think this work does harm? Would it not be great to have millions
> of TEI texts online, in the public eye, raising our profile hugely,
> available for large-scale processing, NLP, entity recognition etc., if
> desired, or for taking as a starting point for anyone who wants to enrich a
> 19th century novel with deeper structural or semantic markup, or... any of a
> number of other possibilities?
> I really don't see how any of this takes away from what I am most interested
> in, which is the highly detailed markup possible in TEI.
>> Having said that, I don't think the TEI as it currently exists is
>> necessarily incompatible with greater interoperability, and certainly
>> the TEI community has the expertise to provide an interoperable format
>> in addition to an interchange format, but it would be something
>> different and should supplement rather than replace what we currently
>> have.
> I think the interoperable core of 99% of TEI documents is not in any way
> incompatible with the development of new and ever more sophisticated
> features for improvement of TEI's expressiveness. Maybe we do need more
> explicit guidance (as Julia suggests in her email that I've just spotted
> after writing 3/4 of this one) on what that interoperable core should look
> like, but maybe that's also something which can be intuited from the core of
> what most people use (as Lou described the origins of TEI Lite earlier).
> Either way I'd have a lot of hope for and excitement about the interaction
> of these two views of TEI, which I don't by any means see as in conflict
> with one another.
> G
> --
> Dr Gabriel BODARD
> (Research Associate in Digital Epigraphy)
> Department of Digital Humanities
> King's College London
> 26-29 Drury Lane
> London WC2B 5RL
> Email: [log in to unmask]
> Tel: +44 (0)20 7848 1388
> Fax: +44 (0)20 7848 2980

| John A. Walsh
| Assistant Professor, School of Library and Information Science
| Indiana University, 1320 East Tenth Street, Bloomington, IN 47405
| www: <>
| Voice:812-856-0707 Fax:812-856-2062 <mailto:[log in to unmask]>