Serge HEIDEN wrote:
> Section B is relative to the various parameters that could be needed
> to process the text by a particular tool.
> For example, in the analysis tools we build here, we would need to encode
> the following parameters :
> - specific tokenization rules, including character classes definitions and
> particular element roles definition : elements composing or delimiting
> words (num, abbr, w...), reading choosing rules (choice, corr, sic...), etc. ;
> - specific text processing parameters :
> -- like "not to be processed" rule definitions : focusing on elements
> like gap, note... ;
> -- like "specific indexes" rule definitions : focusing on elements like
> head, foreign, hi... ;
> - application specific parameters :
> -- like text/corpus partitionning definitions, section referencing policy, etc.
Yes, this is precisely the kind of information which any decent text
processing system needs to store in the header. For XAIRA we developed
a fairly complex <xairaSpecification> element, defined as an extension
to the <encodingDesc> element in the Header to do this job: if you're
interested, check out the gory details at
One of the things that worries me about Martin's proposal is that I
don't see how it can scale up to this sort of level. I don't for a
moment imagine we've got the xaira spec right, but we need something
with this degree of expressiveness, and what's on the table so far
doesn't approach it.
Lou (momentarily not wearing TEI-editor-hat)
> As that kind of CONTEXTUAL information about a text or a corpus
> is something new in the TEI Header, I would suggest to create a new
> child element named *processingDesc* sibling of fileDesc, encodingDesc,
> profileDesc and revisionDesc.
> Please note that *any* software should be able to store information
> there. For example, even general editors could store informations like :
> - printed by
> - last print date
> - editing duration
> - total editing duration
> - document model used
> - autoload on/off
> - etc.
> as can be seen in the metadata part of the ODT file format for example.
> To be able to give a particular place for every software in the processingDesc
> element, we could use a mecanism like the Java package namespace one.
> Serge Heiden, [log in to unmask], https://weblex.ens-lsh.fr
> ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Franšaise
> 15, parvis RenÚ Descartes 69342 Lyon BP7000 Cedex, tÚl. +33(0)622003883