Hi Eduard and Serge,
One of the approaches I think more generalistic TEI software
development should take is specification of input formats and
provide generalised conversions from tei_all to the subset that
the software does something useful with. (Ok, this is perhapsless
relevant for things like TXM, general editors, or database
frameworks.) But if we imagine a new tool to display, visualize,
or process TEI there is no reason it should necessarily cope with
the whole of the TEI. It can use the TEI ODD customisation
language to specify a meta-schema that it can handle. (And as
Magdalena was noting including processing model information in
that so it could act as a sort of configuration file for that
processing.) If your software won't do anything with <w>
elements and just ignore their existence then don't include them
in the schema and let people get errors/warnings about them. If
you have a fixed list of @type attributes your software expects
on <name>, then document that in TEI ODD. And then through schema
errors or schematron warnings a user can test if their source
documents are processable by that bit of software. Even better if
there is then a tei_all to MySpecialSoftware conversion script
which throws away all the stuff this piece of software is going
to ignore or fail on. I know the next question will be why are
we encoding it if we then throw it away -- and clearly the answer
is we may throw it away for _this_ bit of processing or
visualization or whatnot, but that doesn't mean it isn't crucial
for other bits of analysis and research. So your TEI Zero, for
example, I can validate against its schema and if I don't get any
warnings then I know that your software won't have a problem with
it. If I do, I can judge if they are errors, or warnings like --
"Your <name type="thingy"> will be treated as <name type="other">
in our software" -- then I can make an informed decision about
how the software will work with my texts or whether I should
convert them to match your values. I realise, of course, some
people already do this, but it may be worth reiterating as it
seems a lot more practical than people trying to develop software
that will cope with any of the TEI vocabulary (never mind new
things a project adds...).
On 02/11/16 14:14, Serge Heiden wrote:
> Hi Eduard,
> Le 02/11/2016 à 12:46, Eduard Drenth a écrit :
>> TEI offers flexibility and freedom (i.g. <span type="lemma"
>> target="w1 w2"> instead of <lemma target="w1 w2">) that
>> complicates tool development. How big of a problem is this?
> From an IT perspective, working with TEI encoded texts is like
> catching chameleons hopping in XML trees.
> If you work with people feeding chameleons, you can negotiate
> some synchronized convergence of colors.
> Not necessarily the colors of the chameleons themselves (aka
> local encoding guidelines) but at least how
> you are supposed to see them.
> We develop a text analysis and publishing platform called TXM
> with which we regularly use this strategy through XSLT adapters
> to help colleagues analyze and publish their TEI texts.
> Often because projects tend to encode their texts before
> choosing a final analysis and publishing platform.
> However, it is not easy to choose a TEI aware analysis and
> publishing platform (established software or adaptable
> framework) because it is not easy to specify what analyzing and
> reading mean.
> A kind of "chicken or egg" dilemma, with chameleons...
> See this tutorial for TXM TEI import strategy introduction and
> examples (sorry for French):
> Another strategy is to negotiate convergence to a manageable
> subset of TEI like TEI lite, tite, simple, zero...
> (we are designing the latter for TXM work).
> Dr. Serge Heiden,[log in to unmask],http://textometrie.ens-lyon.fr
> ENS de Lyon/CNRS - IHRIM UMR5317
> 15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
Dr James Cummings, Academic IT Services, University of Oxford,
TEI Consultations: [log in to unmask]