On 23 Nov 2013, at 03:24, Stuart A. Yeates <[log in to unmask]> wrote:
> Your approach to validating a long list of random TEI files was to download the latest RNG and validate them against it (+ some very clever P4 stuff that I'd not thought of). If a file validates, then it is perfect. This is a perfectly sensible approach and, if our focus is entirely on having as many of the community in sync with the latest and greatest version of TEI, may be optimal.
> My approach (which I've not yet coded) was to identify the odd files in the list (by attempting to generate RNG files from them) and check each files against RNG. This is a slightly more complex approach, but automatically supports the latest TEI version, odd-building SIGs (i.e. TEI in libraries), and people with highly customised odds out-of-the-box. Since it's unlikely that many TEI files will validate against every customisation, a single test for 'goodness' is significantly harder.
> So it boils down to a tension between supporting the community as a whole and supporting a community of communities.
hmm. i think you’re inventing a tension which doesn’t exist
There are many TPIs (TEI Performance Indicators) one could apply to a resource using a
purely computer-mediate approach (if humans are involved, one goes in different directions).
The first test is the simplest - is it actually accessible, complete, well-formed, XML? If not, we stop there. Next,
we see if it talks TEI at all, and the simplest test for that is whether it conforms to tei_all (another test one might do is to see whether all its vocabulary, elements and attributes, is drawn from the TEI set; I’m intrigued
enough to think about writing an ODD for this).
Now we can move on to other, more useful, TPIs. You’re suggesting one which is “does it meet its own
standards”, i.e. does it conform to the TEI subset it claims to. The problem there is that you don’t know
which customisation it thinks it conforms to; finding an ODD in the same location is not a reliable method, I think.
But we should really be discussing TPIs which get at quality and consistency of encoding, and
indicators which show the reader what they are getting. Depending, of course, on what the point
of all this is :-}