Print

Print


> For example, when there is no sentence encoding available, an example 
>  rule we use is : even if at the end of the content of a <div>/<head>
> element there is no such thing as a "hard" ponctuation character
> (which is often the case) - which is an heuristic we use
> to find the end of sentences - force the end of sentence there.
> This kind of rule is active if we decide that section titles are part 
> of  text to index. I am not saying here that section titles must be 
> madeof  sentences, but the fact is that sentence boundaries are used 
> a lotby  tools like POS taggers, syntactic chunkers, etc. and if it is
> decided that section title must be indexed after lemmatization for
> example, the need of a sentence context artificially rise up, so we 
> have  to model them. In that example, the <head> element may be 
> declared asa  potential "implicit sentence splitter".

Im' not sure it is a question the TEI should adress. You give a semantic
to "head" which is not it's "TEI semantic". The meaning of "head" is
"head", not "implicit sentence splitter". You are completly right to use
head in heuristic for sentence splitting. Nevertheless, declarative markup
format and processing heuristics are inherently different.

What the TEI can do, IMHO, is only to provide a mechanism for the
application to ask the user for information. For instance, a mechanism
thanks to which the application can define a way for user to tell which
element should be used as sentence boundary.

> So, it seems that <tagUsage> alone would be cumbersome to use to
> declare various processing parameters (my initial section B, section A
> being encoded in the <revisionDesc> element).

I don't see why tagUsage cannot be the natural place for storing more
formalised structures an user could use to tell an application the usage
it should do of tag. I would rather preserve the global declarative logic
(information about element should go in one place), with different degree
of formalisation, rather than re-inventing a header, more formalised, for
other target, elsewhere. The benefit would be to keep coupled, as far as
possible, different degrees of formalisation.

> Fortunately, <tagUsage> has, in P4 and P5, a sibling called 
> <rendition>  which "supplies information about the intended rendition 
> of one or more  elements" If, the <rendition> element was created to 
> somewhat compensate
> for a TEI encoding practice more oriented toward logical than  
> presentation information encoding, then it is a cousin of a 
> <processing>  element wecould create, maybe to replace <rendition>, 
> whose role would  be to encode
> processing parameters not only related to presentation -as the  
> "rendition" name suggests (aka CSS or XSLT style sheets), but also
> to any specific processing. On another hand, if the "rendition" of a 
> document was the only process planned at a moment to apply to TEI 
> documents, maybe it is time to generalize a bit.

Rendition is not really a precisely defined element. It is a placeholder
(a "hook" say the Guildelines) for expressing thinks the TEI is not
responsible of.

> But I can see the size of the community supporting the (ISO 
> capacified) ODT file format standard. And I think it would be a pity 
> not to situate our discussion with respect to a standard that will 
> store the metadata
> of the majority of our documents for a long time from now, even 
> ifit's  "only" for word-prcoessing.

This is exactly the class of argument I was trying to prevent. The ODT
file format may be as successful as possible, this will never be a reason
to use it in an completly different area. The discussion is about coupling
a format intended to human-oriented annotation with processing tools which
need a serialisation format. I'm not sure the case of ODT will not bring
more confusion than clarity in this debate.

-- 
Sylvain Loiseau
[log in to unmask]
http://www.limsi.fr/~sloiseau

On peut pratiquer objectivement, c'est-à-dire impartialement,
une recherche dont l'objet ne peut être conçu et construit
sans rapport à une qualification positive et négative, dont
l'objet n'est donc pas tant un fait qu'une valeur.

Canguilhem, /Le normal et le pathologique/, p. 157

----------------------------------------------------------------
Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre