That's a really good idea. We'd have to allow for private URI schemes
too -- we want to dereference and check those -- and we're not going to
be checking external links, just project-internal ones, so we can
discard anything with a public protocol.
There are edge-cases, of course; you could have this:
where the second is linking to a file with no extension. We might
stipulate that links to documents without extensions won't be checked. I
think we'll ignore XPointers, on the basis that this is a tool intended
for encoders who can't write their own diagnostic tools; any project
that uses a lot of XPointers probably has someone who can write XSLT.
So we could simply tokenize all attribute values on whitespace, check
each token to see if it looks like a pointer, and check it if it does.
On 2017-03-02 09:24 AM, Tomaž Erjavec wrote:
> Martin Holmes je 02/03/2017 ob 18:16 napisal:
>> What would you say is the simplest, cleanest way to generate an XPath
>> which selects all and only the teidata.pointer attributes in a
>> document which validates against tei_all?
> Not cleanest, but it definitely simple: treat as a pointer any attribute
> value that looks like a pointer, e.g. matches /^#/, /^http(s)?/ or
> That's what we did, and it works pretty well - most errors in pointers
> are to do with some typo or the target being 404. If links checking is
> what you are after, of course.