On 2017-03-03 01:10 AM, Tomaž Erjavec wrote:
> Martin Holmes je 02/03/2017 ob 19:37 napisal:
>> That's a really good idea. We'd have to allow for private URI schemes
>> too -- we want to dereference and check those -- and we're not going
>> to be checking external links, just project-internal ones, so we can
>> discard anything with a public protocol.
> Yes, of course, you can have private schemes, but hopefully you can
> catch them with some regular expression.
They're presumably documented with <prefixDef>s, and I've already
implemented the handling for this; it works nicely on the data we've
tested with so far. Checking all attributes is inevitably slow, but I
think you're right that it's the safest thing to do.
>> There are edge-cases, of course; you could have this:
>> <ref type="document">
>> versus this:
>> <ref target="document">
>> where the second is linking to a file with no extension. We might
>> stipulate that links to documents without extensions won't be checked.
> I'd think this is fair enough - it's a strange file that doesn't have an
Yes, and even stranger that you would link to it from an XML document.
>> I think we'll ignore XPointers, on the basis that this is a tool
>> intended for encoders who can't write their own diagnostic tools; any
>> project that uses a lot of XPointers probably has someone who can
>> write XSLT.
>> So we could simply tokenize all attribute values on whitespace, check
>> each token to see if it looks like a pointer, and check it if it does.
> Exactly, and glad that you like the idea. In case it would help, you can
> find our script - not very elegant and might not cover all cases - at
Looks great. Ours is a bit longer, but it's trying to do a bit more
(check links to documents as well as ids in documents, and dereference
private URI schemes). It's here:
It's intended to work in Oxygen and outside it, and it's based on an ant
>> On 2017-03-02 09:24 AM, Tomaž Erjavec wrote:
>>> Martin Holmes je 02/03/2017 ob 18:16 napisal:
>>>> What would you say is the simplest, cleanest way to generate an XPath
>>>> which selects all and only the teidata.pointer attributes in a
>>>> document which validates against tei_all?
>>> Not cleanest, but it definitely simple: treat as a pointer any attribute
>>> value that looks like a pointer, e.g. matches /^#/, /^http(s)?/ or
>>> That's what we did, and it works pretty well - most errors in pointers
>>> are to do with some typo or the target being 404. If links checking is
>>> what you are after, of course.