Print

Print


Jean-Luc Benoit wrote:

>M
>
>>  
>>    
>>
>>>Is there a solution to keep a rich categorisation in my file?
>>>    
>>>      
>>>
>>"a" is something of a numerical underestimate.
>>  
>>    
>>
>
>I understand the difficulties in well obtaining linguistics terms 
>accepted by all the linguists. It is undoubtedly an impossible mission.: 
>-) But when the TEI started to reflect on these recommendations to 
>describe the literary texts,  spoken texts,  dictionaries,  lexicons, 
>the task was considerable. And the result is creditable. 
>
>  
>

This debate is almost as old as the TEI. The committee charged with 
preparing recommendations for linguistic annotation certainly considered 
the possibility of recommending a specific set of linguistic tags[1] and 
even started work on it (I think the approach taken eventually became 
the basis of the EAGLES recommendations) . But it finally decided that 
the only correct solution was to propose general facilities for the 
representation of linguistic annotations, which became the two chapters 
we now have: on analysis and interpretation and on feature structures 
respectively. As Michael Beddow says "TEI is not a forum for debating 
linguistic terminology or analytic procedures (any more than it is a 
body that can decide the taxonomies and procedures of codicologists or 
prosopographers)" . And even when apparently taking a view on how books 
are organized, the TEI does so in very general ("div") terms.

>the question is ("the" is something of underestimate ) :
>* Does there exist a request of the linguists to work on categorized 
>corpora ?  
>  
>

I don't understand the question. Linguists certainly do work on analysed 
corpora -- they use many different formalisms for representing their 
analyses though, some of them TEI conformant, others not.

>* Is  technically realizable ? (an annotated file is very heavy).
>*.........
>  
>

Yes. But disk is cheap!

>*.........
>
>By reading BNC DTD, I noted that I could inspire some to me for a French 
>text.
>
>  
>

The tagset [1] used by the BNC is specific to the POS tagger -- CLAWS -- 
which created the annotations. Although widely used it has also been 
criticized in some corners for its lack of internal structure.

>Perhaps would be necessary it to launch a total dialogue on the subject 
>and to arrive to built recommendations to enrich the chapter concerning 
>the linguistic annotations ?
>
>  
>
Specific suggestions always welcome -- but it must be said that 
linguistic annotation is a very well trodden field and a great deal of 
information on it is readily available elsewhere. You could talk to your 
colleagues in Nancy about linguistic annotation framework, for example!

[1] Not the least confusing aspect of this is that linguists typically 
use the word "tag" or "tagset" to refer to specifically the annotations 
(such as "NN1") which in TEI/XML terms is an attribute value not a tag.







>Best regards,
>
>  
>