Conal Tuohy wrote:
>> When a user searches a collection of TEI documents, and receives a
>> list of links (hits), what do those links point to? In some cases
>> they should point to (a representation of) the full TEI document. In
>> other cases they could point to a chapter (div) or sub-section from
>> a document. In general the desired granularity might depend on a
>> number of factors.
>> I would be interested to hear how other implementors of search
>> engines have dealt with this issue; what are the factors which
>> determine granularity of searching? how are they reflected at the
>> level of TEI encoding?
In our lexicometric projects, we do it on a text-type by text-type basis for
various aspects of corpus analysis. Sometimes text by text.
TEI encoding determines what is POSSIBLE in texts of a collection,
and specific metadatas determines what HAS to be done
in a corpus of texts for a research project.
Specific processing metadata tune things for each text for :
- what to look for : if tagged text, where are the tags (some texts
may have simultaneously several tagsets applied)
- where to look for : count the number of <W> per <S> or
per <P>, per <SP WHO=...>, <TEXT>...
- how to split the results : year by year or genre by genre, etc.
With nested loops : century by century and then genre by genre.
This is somewhat oriented by quantitative text analysis.
But it also has something to do with qualitative text analysis in the way
we build bibliographic references in kwic concordances for example.
Kwic references need to be short, to be printable on the same line
as the key words in context. Pertinent information is choosen text-type
by text-type, or sometimes in function of the partitions involved
(in a year by year partition it is useless to put the year in references).
We plan to give the ability to "build" references to the user,
somewhat in the same way the TACT system do it, but in
the XML-TEI universe.
Serge Heiden, [log in to unmask], https://weblex.ens-lsh.fr
ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Franšaise
15, parvis RenÚ Descartes 69342 Lyon BP7000 Cedex, tÚl. +33(0)632010638