Dear all,
I have been working for a while on a way for producing nice PDF files
from speech corpora coded in XML within the recommandations of the TEI
(Transcription of Speech, and Language Corpora). I eventually come to
some satisfactory results with some restrictions for coding overlaps
with synchronisation mechanisms. There are obviously many things that
need to be improved and would like to have your opinion on this work
and your suggestions or help for improving the things.
Above all, I would like to know if a part of my work (the xslt
stylesheet) could be merged within Sebastian Rahz passiveTeX
implementation of the TEI Lite, since for what I know, it can't deal
with transcription of speech.
Basic description of the way it works :
Transcription of speech should be coded within a division of text <div>
It consists of utterances, events or pauses (and also kinesic, vocal or
shift events, but I think these should be embedded into an utterance).
These elements are formatted into an indented list, with the name of
the speaker (attribute who of <u>) as a tag in the indentation.
When part of utterances are to be synchronised because they overlap
each other, these are vertically aligned (along a vertical rule) to
graphically render the synchronisation.
For coding the synchro, I need utterances to set their 'trans'
attribute to "overlap" and to be broken down into segments :
<u who="Q" id="u1">
<seg id="u1s1">I would like to introduce you Miss</seg>
<seg id="u1s2">Moneypenny</seg>
</u>
<u who="B" id="u2" trans="overlap">
<seg id="u2s1" synch="u1s2">How do you do</seg>
</u>
<u who="M" id="u3" trans="overlap">
<seg id="u3s1" synch="u1s2">Pleased to meet you</seg>
</u>
The stylesheet brings together overlapped and overlapping utterances,
gets the ordered list of the segments on which the others are synched
to, and produce as LaTeX code corresponding to a list of columns (one
column is the set of synchronous segments).
Q: I would like to introduce you Miss [Moneypenny
B: [How do you do
M: [Pleased to meet you
My package consists of three files :
-- the xslt stylesheet for converting XML to LaTeX
-- the dialogue.sty LaTeX file for formatting dialogues
-- the corpustei.sty LaTeX file for rendering options (choosing symbols
for rendering rise of tone, latching, truncation, ...).
I can send it by mail on demand.
Matthieu
--
Matthieu QUIGNARD, Chargé de recherche au CNRS
Equipe "Langue et Dialogue", UMR 7503 LORIA
Campus Scientifique, BP 239
54506 Vandoeuvre-lès-Nancy Cedex (France)
Tel +33 383 59 20 34 Fax +33 383 41 30 79
http://www.loria.fr/~quignard
|