Print

Print


James Cummings wrote:

>I still think there is a problem where you have collections of text of 
>unknown or inconsistent structures and you want to index them so a user 
>just entering a keyword gets back sensible sections of any individual part 
>of a composite document.  One way to do it, as you know, is for any 
>individual hit to test ancestor nodes for a set of known element types 
>(sp/p/divs of particular types/etc.) and use that to know how much to give 
>the user.  Just strikes me as not the best solution because of fixing the 
>need in the logic (ie. the xslt) to know the types of document in the 
>collection.  If we only have prose and then add a poem, or a play, 
>suddenly then the logic needs to change.  If, of course, it has only been 
>written to handle the existing known types, rather than all probable 
>types.  I know I'm making too much of that and it is not too difficult, 
>but it is the principle behind it that I'm interested in.

If it is likely that the logic needs to change sometime, it is perhaps best 
to use a sort of "search sheet" which specifies the behaviour of a search 
engine similar to what a style sheet does when applied by a rendering 
engine.  For example, such a seach sheet could include the command to 
specify the framing tag of the context of a search word to be displayed as 
follows:

<DisplayContext FramingTag = "div p" />

which means: "use the 'div' tag and, if no 'div' tag was found, fall back 
on the 'p' tag".  If at some later stage a new document is added to the 
collection which uses <div1> instead of <div> this document could be 
associated with a modified version of the seach sheet which uses the command:

<DisplayContext FramingTag = "div1 p" />

The association of a TEI document and a search sheet could either be 
hardcoded as a processing instruction in the TEI document or some external 
mechanism could be used (a catalogue file, autodetecting by file name, 
analysis of the document header, etc.), or a combination of both.

Dieter Köhler