If you are interested in linking your XML format question
to software helping you to transcribe or annotate or analyse
speech data, I suggest the following TEI journal article
as a good entry point:
Schmidt, Thomas (2011): A TEI-based Approach to Standardising Spoken
Language Transcription. In: Journal of the Text Encoding Initiative
Although, as Lou mentioned, your example seems somewhat related
to the 'SIL Toolbox' or 'CHILDES CLAN' tools(*) like annotation needs,
you may be interested in the TEI related resources found at the
Those XSL or Java based tools already read and write XML-TEI
based representations, so their format and algorithms may be of
interest to you.
(*) A comprehensive list of speech data annotation tools
(including time based tiers - or speech turns based - primary
structured interface / representation):
Le 19/04/2013 13:16, Lou Burnard a écrit :
> Tiered annotation like this is something I trace back to the Summer
> Institute's ancient "Toolbox" software. I vaguely recall some work by
> Gary Simons on interfacing it with SGML/XML way back in the day. The
> difficulty with this kind of formalism is that it hides almost as much
> as it reveals : the actual alignments between the tokens in each tier
> are not explicit, and have to be inferred either by "just looking" or by
> weird rules like those exemplified in your line of syntactic
> annnotation. However, it's clear that lots of people think of linguistic
> annotation like this -- also, for example, when analysing spoken
> language where you have a prosodic or phonemic "layer" as well as all
> the others. And there *are* mechanisms for doing this in TEI -- no
> better or worse than those available in other systems which want to
> represent a fundamentally non-hierarchic structures using hierarchic
> There's a lot of useful and interesting stuff available on the TEI Wiki
> about ways of doing or not doing this in TEI: jump off at
> http://wiki.tei-c.org/index.php/LingSIG and keep us posted!
> On 19/04/13 07:59, Joshua Crowgey wrote:
>> Sure, I mean this kind of structure:
>> No me digas.
>> No me dig-as
>> NEG 1SG.ACC say-SBJN.2SG
>> Don't tell me.
>> One or more 'source' lines, with one or more annotation lines.
>> The basic trick involves capturing the particular properties of
>> alignment between the tiers. The example above is very basic, a more
>> complex example might have POS tags on a tier and syntactic structure
>> on another.
>> three cats danced the waltz
>> NUM NN VBP DET NN
>> 0,5=S 0,2=NP 3,5=NP 2,5=VP
>> I can provide some references that describe this stuff in more detail
>> if it's helpful for anyone. In fact, I'm currently working on a
>> review of formats for this kind of data, so I'm trying to figure out
>> how TEI fits into that review.
>> On 04/18/2013 11:49 PM, Laurent Romary wrote:
>>> One possibility would be to use<cit> as a construct for representing
>>> the source text and further glosses, annotations, etc. Maybe you
>>> could provide an example to see the kind of use case you have in mind.
>>> Le 19 avr. 2013 � 07:22, Joshua Crowgey a �crit :
>>>> I'm researching XML formats for interlinear glossed text as found in
>>>> linguistic corpora. I've found a few pre-xml TEI discussions on the
>>>> Is the feature structure tagset still the relevant one for IGT? What's
>>>> the current best practice? Are there any linguistic projects using TEI
>>>> of note?
>>>> Happy Friday,
>>> Laurent Romary
>>> INRIA& HUB-IDSL
>>> [log in to unmask]
Dr. Serge Heiden, [log in to unmask], http://textometrie.ens-lyon.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883