On 3 Nov 2013, at 16:23, Lou Burnard <[log in to unmask]> wrote:
> As you probably know, many TEI attributes have a declared value of "anyURI", so they can point almost anywhere. Out of curiosity, what techniques/tools do people recommend for the validation of said URIs? I can find plenty of tools which will check that the syntax of the value is correct, but what technique/tool would you recommend to find out whether it is actually valid -- in the sense that starting from here (where the document is) I can recover whatever it's pointing at?
There is no technique other than sucking it and seeing. As James says, you could do this in an XSLT validator if you thought the target had to be XML, but
I assume you probably want something more general.
I'd be writing a script which used curl to get the http headers for each URL and saw what came back (that's the technique used by the Batcomputer at
Oxford, if you recall that). But there are a lot of interesting things that can come back. you have to allow (at a minimum) for:
* authentication requests
* temporarily unavailable
* content type negotiation
eg on a command line, is "http://www.w3.org/1999/XSL/Transform" a 'valid' URI?
Sebastians-MacBook-Pro-2:rahtz$ curl -I http://www.w3.org/1999/XSL/Transform
HTTP/1.1 200 OK
Date: Sun, 03 Nov 2013 17:32:40 GMT
Last-Modified: Tue, 19 Jun 2012 14:22:15 GMT
Expires: Sun, 03 Nov 2013 23:32:40 GMT
Content-Type: text/html; charset=iso-8859-1
Director (Research) of Academic IT
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431