Print

Print


Martin, David,

it seems that TEI element w
(<http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-w.html>)
offers several possibilities to translate, via XSL, all elements and
attributes of Perseus treebanked XML into valid TEI P5. In fact, this
would be an interesting experiment.

Best,

Neven

Neven Jovanovic
Zagreb, Hrvatska / Croatia


On 8 December 2012 13:24, Martin Mueller <[log in to unmask]> wrote:
> Indeed. In the Perseus projects undergraduates are playing a significant
> role creating treebanks of Greek and Latin texts. You can download the XML
> for them at
> http://nlp.perseus.tufts.edu/syntax/treebank/greek.html and
> http://nlp.perseus.tufts.edu/syntax/treebank/latin.html
> You can also search them in annis at
> http://annis.perseus.tufts.edu/
>
> Here is an example of a Greek sentence:
> <sentence id="2273668" document_id="Perseus:text:1999.01.0129"
> subdoc="card=1" span="mousa/wn0:.0">
> <primary>millermo</primary><primary>jfg</primary>
> <secondary>alexlessie</secondary>
> <word id="1" cid="34227684" form="mousa/wn" lemma="*mou=sa1"
> postag="n-p---fg-" head="3" relation="OBJ"/>
> <word id="2" cid="34227685" form="*(elikwnia/dwn" lemma="*(elikwnia/des1"
> postag="n-p---fg-" head="1" relation="ATR"/>
> <word id="3" cid="34227686" form="a)rxw/meq'" lemma="a)/rxw1"
> postag="v1ppse---" head="0" relation="PRED"/>
> <word id="4" cid="34227687" form="a)ei/dein" lemma="a)ei/dw1"
> postag="v--pna---" head="3" relation="OBJ"/>
> <word id="5" cid="34227688" form="," lemma="comma1" postag="u--------"
> head="14" relation="AuxX"/>
> <word id="6" cid="34227689" form="ai(/q'" lemma="o(/ste1"
> postag="p-p---fn-" head="8" relation="SBJ"/>
> <word id="7" cid="34227690" form="*(elikw=nos" lemma="(elikw/n1"
> postag="n-s---mg-" head="9" relation="ATR"/>
> <word id="8" cid="34227691" form="e)/xousin" lemma="e)/xw1"
> postag="v3ppia---" head="14" relation="ATR_CO"/>
> <word id="9" cid="34227692" form="o)/ros" lemma="o)/ros1"
> postag="n-s---na-" head="8" relation="OBJ"/>
> <word id="10" cid="34227693" form="me/ga" lemma="me/gas1"
> postag="a-s---na-" head="13" relation="ATR_CO"/>
> <word id="11" cid="34227694" form="te" lemma="te1" postag="g--------"
> head="13" relation="AuxY"/>
> <word id="12" cid="34227695" form="za/qeo/n" lemma="za/qeos1"
> postag="a-s---na-" head="13" relation="ATR_CO"/>
> <word id="13" cid="34227696" form="te" lemma="te1" postag="g--------"
> head="9" relation="COORD"/>
> <word id="14" cid="34227697" form="kai/" lemma="kai/1" postag="c--------"
> head="1" relation="COORD"/>
> <word id="15" cid="34227698" form="te" lemma="te1" postag="g--------"
> head="22" relation="AuxY"/>
> <word id="16" cid="34227699" form="peri\" lemma="peri/1"
> postag="r--------" head="21" relation="AuxP"/>
> <word id="17" cid="34227700" form="krh/nhn" lemma="krh/nh1"
> postag="n-s---fa-" head="22" relation="ADV_CO"/>
> <word id="18" cid="34227701" form="i)oeide/a" lemma="i)oeidh/s1"
> postag="a-s---fa-" head="17" relation="ATR"/>
> <word id="19" cid="34227702" form="po/ss'" lemma="pou/s1"
> postag="n-p---md-" head="21" relation="ADV"/>
> <word id="20" cid="34227703" form="a(paloi=sin" lemma="a(palo/s1"
> postag="a-p---md-" head="19" relation="ATR"/>
> <word id="21" cid="34227704" form="o)rxeu=ntai" lemma="o)rxe/omai1"
> postag="v3ppie---" head="14" relation="ATR_CO"/>
> <word id="22" cid="34227705" form="kai\" lemma="kai/1" postag="c--------"
> head="16" relation="COORD"/>
> <word id="23" cid="34227706" form="bwmo\n" lemma="bwmo/s1"
> postag="n-s---ma-" head="22" relation="ADV_CO"/>
> <word id="24" cid="34227707" form="e)risqene/os" lemma="e)risqenh/s1"
> postag="a-s---mg-" head="25" relation="ATR"/>
> <word id="25" cid="34227708" form="*kroni/wnos" lemma="*kroni/wn1"
> postag="n-s---mg-" head="23" relation="ATR"/>
> <word id="26" cid="34227709" form="." lemma="period1" postag="u--------"
> head="0" relation="AuxK"/></sentence>
> </sentence>
>
> This is not P5, but it is quite sensible and very simple and could be
> turned into P5 quite readily, if the attributes 'pos', 'head' and
> 'relation' were permitted in pure P5. It offers a model of morphosyntactic
> annotation and shallow parsing  that would work across a large number of
> Indo-European languages
>
>
>
>
>
>
> On 12/8/12 5:07 AM, "Neven Jovanović" <[log in to unmask]> wrote:
>
>>Dear David,
>>
>>I'm no linguist, but I know that two classical languages projects --
>>Perseus and Alpheios -- rely largely on treebanking, modelling it in
>>TEI XML. They not only have lots of experience, examples and
>>documentation, but also tools which you can use for modelling on your
>>own.
>>
>>The addresses:
>><http://nlp.perseus.tufts.edu/syntax/treebank/>
>><http://treebank.alpheios.net/>
>><https://wiki.projectbamboo.org/pages/viewpage.action?pageId=24642179>
>>
>>Best,
>>
>>Neven
>>
>>Neven Jovanovic
>>Zagreb, Hrvatska / Croatia
>>
>>
>>On 7 December 2012 14:18, Frederik Elwert <[log in to unmask]> wrote:
>>> Dear David,
>>>
>>> thanks for bringing up this topic. I am currently involved in a project
>>> where we want to achieve something similar, so I am also interested in
>>> recommendations and best-practice examples.
>>>
>>> I am fairly new to the TEI, but from what I learned so far, my first
>>> attempt would be something along the following lines:
>>>
>>>  1. Annotate the words/morphemes using ISOCat data categories (using
>>>     feature structures and dcr:datcat in TEI), and
>>>
>>>  2. Define the syntactic relations between elements as a graph.
>>>
>>> Step 2 seems to depend on the linguistic model you choose. Especially,
>>> if you use a constituency grammar model or a dependency grammar model.
>>> If I read things correctly, the SynAF model uses the first approach, but
>>> both are probably viable choices.
>>>
>>> Currently, I guess it would be possible to store the syntactic graph in
>>> TEI using the graph module. But I have to admit that my investigation in
>>> this direction is still at the beginning, so I’d like to learn more
>>> about this. There seem to be non-TEI XML representations for syntactic
>>> graphs like tiger2 for SynAF, but I have not yet fully evaluated the
>>> available options.
>>>
>>> This is not so much an answer to your questions than my preliminary
>>> thoughts on the subjects you raised. But maybe this already contains
>>> some hints, and maybe others can give more elaborate answers.
>>>
>>> Regards,
>>> Frederik
>>>
>>>
>>> Am Donnerstag, den 06.12.2012, 16:55 -0500 schrieb Birnbaum, David J:
>>>> Dear TEI-L,
>>>>
>>>> Some colleagues have asked me for guidance in using TEI markup to
>>>>support syntactic analysis. I'm looking for some general guidelines
>>>>(e.g., "how does one represent linguistic relationships between words
>>>>in a sentence?"), but in case it helps, their specific immediate object
>>>>of study involves a type of pseudo-passive construction in Russian that
>>>>uses passive-participle verbal morphology but also a direct object in
>>>>the accusative:
>>>>
>>>> Orthography: Лодку унесло ветром
>>>> Romanization: Lodku uneslo vetrom
>>>> Interlinear gloss: BOAT-accusative-sg CARRY-past-neuter-singular
>>>>WIND-instrumental-singular
>>>> Prose translation: 'The boat was carried away by the wind'
>>>>
>>>> I'm not asking about the linguistics, of course. My question is
>>>>whether there are TEI facilities that would enable someone to model
>>>>syntactic structures (including odd-ball structures like this) in a
>>>>useful way. Tagging the individual words for morphological category is
>>>>easy, but I don't do this kind of linguistics myself, and I'm not sure
>>>>what would be considered Best Practice in the TEI community for
>>>>representing syntactic (e.g, subject~object, etc.) and thematic (e.g.,
>>>>agent~patient, etc.) relationships. There are really two parts to my
>>>>question:
>>>>
>>>> 1. How should one do this in TEI?
>>>> 2. Should one do this in TEI, or in XML at all, for that matter, or is
>>>>XML not the best tool for this sort of work?
>>>>
>>>> Thanks,
>>>>
>>>> David (Birnbaum, [log in to unmask])
>>>>
>>>
>>> --
>>> Frederik Elwert M.A.
>>>
>>> Research Assistant
>>> Centre for Religious Studies
>>> Ruhr-University Bochum
>>>
>>> Universitätsstr. 150
>>> D-44780 Bochum
>>>
>>> Phone +49(0)234 32-24794
>
>