Print

Print


Dear David,

Please have a look at the only slightly outdated[1] examples of the
National Corpus of Polish markup at http://nlp.ipipan.waw.pl/TEI4NKJP/

There are several syntax-related layers there, notably the "words" layer
(identifying the atomic pieces of syntactic constructions, not
necessarily in 1:1 relationship with the result of the morphosyntactic
analysis), and the "groups" layer, identifying the results of shallow
parsing into chunks.

There are more sophisticated structures encoded for NKJP now, but I'm
not sure what their license status is. They might be open and available,
because that is the general spirit behind most of Adam's work.

I also know that Laurent Romary and his team have been working on making
TEI more syntax-friendly.

One more source of information may be Damir Cavar, who produced inline
syntactic encoding in his corpus of Croatian. This may be a good
starting point, though I can't locate encoding examples there:
http://www.cavar.me/damir/work/researchproj/

HTH, I'm sure Adam, Laurent and Damir will be able to elaborate on the
above.

Best regards,

  Piotr

[1]: Please have a look at an up-to-date sample here:
http://nkjp.pl/index.php?page=14&lang=1

On 06/12/12 22:55, Birnbaum, David J wrote:
> Dear TEI-L,
> 
> Some colleagues have asked me for guidance in using TEI markup to support syntactic analysis. I'm looking for some general guidelines (e.g., "how does one represent linguistic relationships between words in a sentence?"), but in case it helps, their specific immediate object of study involves a type of pseudo-passive construction in Russian that uses passive-participle verbal morphology but also a direct object in the accusative:
> 
> Orthography: Лодку унесло ветром
> Romanization: Lodku uneslo vetrom
> Interlinear gloss: BOAT-accusative-sg CARRY-past-neuter-singular WIND-instrumental-singular
> Prose translation: 'The boat was carried away by the wind'
> 
> I'm not asking about the linguistics, of course. My question is whether there are TEI facilities that would enable someone to model syntactic structures (including odd-ball structures like this) in a useful way. Tagging the individual words for morphological category is easy, but I don't do this kind of linguistics myself, and I'm not sure what would be considered Best Practice in the TEI community for representing syntactic (e.g, subject~object, etc.) and thematic (e.g., agent~patient, etc.) relationships. There are really two parts to my question:
> 
> 1. How should one do this in TEI?
> 2. Should one do this in TEI, or in XML at all, for that matter, or is XML not the best tool for this sort of work?
> 
> Thanks,
> 
> David (Birnbaum, [log in to unmask])
>