We absolutely need to encode information in the TEI Header relative to processing
of a text or a corpus. In the line of the recent discussion about that, let me precise
what kind of informations we would like to store there :
A. historical context information : logging ;
B. default processing context informations : parameters.
Section A is relative to the life cycle of the document (text or corpus). I agree
with Sylvain Loiseau that the revisionDesc element is a good place for that -
here we have never differentiated human from machine activities in the way we
encode information in revisionDesc. I propose that the mentioned need to associate
a tool version number to a specific element in the document, even if the element
is the entire text, is similar to the need to bind a responsibility to a particular
change in the text. And this is already provided.
Section A is a good tool to be able to clarify an information, by contacting
the person responsible of the change, or decide how we could undo a change.
Even if the change has been done WITH a particular tool which is not a
general XML editor.
Section B is relative to the various parameters that could be needed
to process the text by a particular tool.
For example, in the analysis tools we build here, we would need to encode
the following parameters :
- specific tokenization rules, including character classes definitions and
particular element roles definition : elements composing or delimiting
words (num, abbr, w...), reading choosing rules (choice, corr, sic...), etc. ;
- specific text processing parameters :
-- like "not to be processed" rule definitions : focusing on elements
like gap, note... ;
-- like "specific indexes" rule definitions : focusing on elements like
head, foreign, hi... ;
- application specific parameters :
-- like text/corpus partitionning definitions, section referencing policy, etc.
As that kind of CONTEXTUAL information about a text or a corpus
is something new in the TEI Header, I would suggest to create a new
child element named *processingDesc* sibling of fileDesc, encodingDesc,
profileDesc and revisionDesc.
Please note that *any* software should be able to store information
there. For example, even general editors could store informations like :
- printed by
- last print date
- editing duration
- total editing duration
- document model used
- autoload on/off
as can be seen in the metadata part of the ODT file format for example.
To be able to give a particular place for every software in the processingDesc
element, we could use a mecanism like the Java package namespace one.
Serge Heiden, [log in to unmask], https://weblex.ens-lsh.fr
ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Franšaise
15, parvis RenÚ Descartes 69342 Lyon BP7000 Cedex, tÚl. +33(0)622003883