Print

Print


Dear Frederik,
you make a good point, but I guess the answer is the good old "it 
depends": for the use I was thinking of, i.e. visualization of an 
edition text possibly coupled with search, a TEI XML document would work 
fine "as is" if the encoding is not too complex; in the latter case, 
you'd have to edit/transform it, as Martin suggests, but the simple fact 
that you'd have a starting point, that you could build on the work of 
others looks like a great advantage to me.

You cite a different case, that of computational linguistics annotation: 
as you note, there are specialized formats that would probably serve you 
better than converting everything in TEI XML, so I think that the 
strategy of providing TEI encoded texts for "general" use and a specific 
format for linguistic analysis makes perfect sense.

As a side note, looking at texts encoded by colleagues using the transcr 
module I noticed that often I would have made (almost) exactly the same 
choices, so that the end product looks remarkably similar. Except for 
some cases where there are too many different ways to do the same thing 
... but I guess not everything TEI may become SIMPLE ;) (although some 
tightening here and there would be a good thing!).

Best regards,

R

Il 08/10/2014 15:54, Frederik Elwert ha scritto:
> Dear Roberto, dear all,
>
> very interesting following all these examples. However, I wonder if
> there is another point to your colleague’s opinion. And the argument is
> hardly new that the TEI allows for such a range of different encoding
> approaches and subtle differences, that using TEI as a *machine
> readable* exchange format is actually really hard.
>
> So I think it is very easy to agree that having texts available online
> allows for data reuse, and that having a format like TEI is a good
> thing, as the format and its documentation make it easier for a *human*
> to understand how to interpret that specific encoding schema.
>
> But building tools that rely on machine readability using available TEI
> sources beyond your control is a very different matter.
>
> Taking the Deutsches Textarchiv as an example: They provide their texts
> not only in TEI, but additionally in TCF, a format from the
> computational linguistics community. So if machine readability is a top
> priority, it seems to be easier to use a more constrained format than to
> model this information in TEI. One could easily encode that information
> in TEI, using feature structures or @ana or something, but probably not
> in a way that one can be sure is understood by a software beyond one’s
> control.
>
> So are there examples of tools that consume (at least a subset of) TEI
> and that work beyond the community that defines the respective TEI
> subset? Or would one simply export one’s data to other formats like TCF
> or RDF if machine readability is an issue?
>
> Best,
> Frederik



-- 

Roberto Rosselli Del Turco      roberto.rossellidelturco at unito.it
Dipartimento di Studi           rosselli at ling.unipi.it
Umanistici                      Then spoke the thunder  DA
Universita' di Torino           Datta: what have we given?  (TSE)

  Hige sceal the heardra,     heorte the cenre,
  mod sceal the mare,       the ure maegen litlath.  (Maldon 312-3)