Print

Print


Jean-Luc Benoit wrote:

[...]

> Is there a solution to keep a rich categorisation in my file?

"a" is something of a numerical underestimate.

One deceptively short answer would be that this is one of the things that
feature sets are designed to do. The reason why the brevity has to be
deceptive is that there is no concise way of explaining or illustrating the
use of feature sets to anything like an adquate degree. And feature sets
have been something of a moving target in recent years, precisely because
the ingenuity of the TEI scheme (which sadly remained unexploited in most
fields of practice) has inspired the creation of new international
standards, by which means this particular prodigal will be able to return in
due course to the P5 fold much enriched by adventures elsewhere.

Another route to explore would be the use of semantically rich compound
attribute values. BNC or its massively bouncing "baby"  is broadly
representative of this direction, though it is perhaps worth remembering
that the BNC markup scheme was devised in days when one of its creators was
heard to remark that no matter how dramatically storage costs might fall,
there would always be a significant difference between the costs of 1
Gigabyte and 2 Gigabytes of online store. And of course all those present on
that occasion thought that was almost too obviously true to be worth
mentioning as a justification of terseness when encoding large corpora.

And I won't do more than mention in passing the problem that is currently
exercising me, namely the design of markup to allow on-the-fly configurable
generation of interlinear morphemic glosses, where "simple" labelling of
static grammatical properties on <w> elements  won't come anywhere near to
getting the job done...

However, I drag that in, because it brings me back to a very important
aspect of feature sets, namely that they allow for significant levels of
indirection in assigning textual labels to the designated features. The
direct labelling procedures you sketch are pretty heavily tied to a specific
terminology and grammar and hence rather work against a wise TEI principle
that scholarly markup should as far as possible avoid foreclosing on
interpretive (or indeed descriptive) possibilities. It may well be that most
people agree on what the word classes of French are and on the terminology
required to categorise their grammatical relationships, but even there it
might be advisable to try to decouple the perception and discrimination of
lexical and grammatical phenomena from their taxonomical ascription. In the
case of languages where experts are unable to agree even on whether they
have any verbs or not, then it is more obviously important to devise markup
schemes that allow those-things-that-some-people-prefer-not-to-call-verbs to
be marked up, and attached to a configurable set of labels in a way that
keeps as many possibilities of nomenclature open while still allowing useful
processing by users of different terminological persuasions.

Michael Beddow