Sylvain Loiseau wrote:
> Yes, but it leads to problem of size: it would increase the size
> of the document by several orders of magnitude.
I don't really see why, unless your taxonomy is massive or hugely verbose.
You define/expand/annotate or whatever each item once only, no matter how
often it's referenced. With P5 and therefore XPointers, it can even (and
easily) be done in a separate document to which all your instances point.
And your reference pointers can be a terse as you like within the limits of
uniqueness, provided they are indeed pointers. Once more, there is the
advantage of indirection. If you have second thoughts about the terminology
of your classification (or if you want to label the items using a different
natural language), you just tweak the targetted element to which the ana
values point, one only for each category
> It may sound trivial, but with
> documents up to 30 MO, I'm not sure of the consequences.
30 MB is not large by modern corpus standards. Any linguistic analysis
software (or hardware) that balks at data dimensions of that order needs
some serious revision.
> On the other hand, define a new attribute with the very
> intended use of @ana is not satisfying neither.
Not sure what "very intended use" means; but if you need something, and it
isn't already there, why should defining it not be satisfying? As I've often
remarked here -- doubtless way beyond many readers' tolerance thresholds --
that's why all the sweat and toil behind the class system and the extension
mechanisms, old and new, was invested on our behalf.