Hi Tomaž,

That's a really good idea. We'd have to allow for private URI schemes 
too -- we want to dereference and check those -- and we're not going to 
be checking external links, just project-internal ones, so we can 
discard anything with a public protocol.

There are edge-cases, of course; you could have this:

<ref type="document">

versus this:

<ref target="document">

where the second is linking to a file with no extension. We might 
stipulate that links to documents without extensions won't be checked. I 
think we'll ignore XPointers, on the basis that this is a tool intended 
for encoders who can't write their own diagnostic tools; any project 
that uses a lot of XPointers probably has someone who can write XSLT.

So we could simply tokenize all attribute values on whitespace, check 
each token to see if it looks like a pointer, and check it if it does.


On 2017-03-02 09:24 AM, Tomaž Erjavec wrote:
> Martin Holmes je 02/03/2017 ob 18:16 napisal:
>> What would you say is the simplest, cleanest way to generate an XPath
>> which selects all and only the teidata.pointer attributes in a
>> document which validates against tei_all?
> Not cleanest, but it definitely simple: treat as a pointer any attribute
> value that looks like a pointer, e.g. matches /^#/, /^http(s)?/ or
> /\..{3,4}$/.
> That's what we did, and it works pretty well - most errors in pointers
> are to do with some typo or the target being 404. If links checking is
> what you are after, of course.
> Best,
> Tomaž