Beatrice Saletti wrote:
> I'm quite desperate
I'm not surprised. It sounds as though you've been given an extremely
difficult job to do with little assistance or support. I suspect that
whoever gave you this task didn't themselves realise the true proportions of
what you are expected to achieve. I'm not sure that with the best will in
the world we can give you the kind and amount of help you need via this
list.
The problem is that any XML vocabulary presupposes the existence in the
items to be encoded of some recurrent, if not necessarily all-pervasive and
wholly consistent, structure. In essence, analysis is about identifying such
structure; and markup is about expressing it. Sometimes, with a bit of guile
and maybe a measure of impudence, markup can impose structure: but it can't
create it ex nihilo or from tohu wabohu. What you describe sounds more like
a horrible mess than a collection of documents ripe for consistent encoding.
I can see the temptation to treat this as a TEI-Corpus, but aside from the
undeniable benefit of giving you separate metadata spaces for each group of
items, I don't think this will take you anywhere near as far as you need to
go if you are aspiring towards a single usable "xml database".
Maybe if I've got the wrong impression about the degree and amount of
heterogeneity in your assemblage (let's not beg the question by calling it a
"corpus" just yet) but even if things aren't as chaotic as they seem, you
are unlikely to be able to master it merely by customisations of the kind
you seem to be hoping the trusty baker will produce for you. I think there
are two separate issues here. First, there's the "small" one of how you can
get MASTER-derived content models into a P4 framework; and that's indeed the
sort of thing this list can doubtless help with. But there's a much bigger
ambition in what you outline, and one that looks to me pretty
self-defeating:
> msEntry (manuscriptEntry) must contain an "identifier"part [...]
> and a "content" made by a new element, 'para', that may
> contain [...] every element for prose, verse, drama,
> transcriptions and elements for physical
> description of manuscripts [...] without hierarchy.
That calls to mind the most nightmareish beast in the entire P4 menagerie:
the <entryFree> element that lurks in Ch 12. The content model of this
extraordinary creation allows the presence of virtually any element in the
Print Dictionary tagset in any order and in any quantity and relationship.
It is just about possible to imagine a half-way sane use for <entryFree>
under an SGML regime, but once XML-ised it becomes merely a way of hiding
anarchy under a camouflage net of vacuous tags. But not even
<entryFree> aspires to allow nearly every element in the core tagsets
"without hierarchy". If you try to create customization files that permit
that, you will break up the class system that is the matrix of all TEI DTDs
or (forthcoming) schemas, and instead of giving you a pizza the Baker
will go on strike in sheer chagrin at what you want him to do in his
orderly kitchen, leaving you at best with a pot of over-stirred and
badly-seasoned ragu.
I hope this doesn't sound utterly discouraging. And maybe others will see a
way to offer more concrete and positive advice on the basis of what you
outline. But I doubt if any experienced TEI hand will feel that you are in
your present situation ready to enter the Bakery with any prospect of
getting anything palatable. In my estimation you have a considerable amount
of scrutiny, differentiation and analysis to carry out on your materials
before you are ready to bake a DTD.
Michael Beddow
|