time for me to delurk, as many issues about the future data-format of
our Corpus are already resolved but there are still some points we could
not yet find the best way around and the issues haven't been posted already.
We use <seg> for the temporal alignment within <u> elements. Consider
the following made-up example:
<u who="#S1">i'd like to tha<seg xml:id="ol_321">nk you for the many
years you've worked</seg> for our company</u>
<u who="#S2"><seg synch="#ol_321">well, <pause dur="PT1S"/> you're
This fails validation against the full TEI schema as well as against
schemas only containing the necessary elements. The offending part is,
by the way, the <pause/> in the second <u>. It is for some reason not
part of the possible content of the <seg>. It would of course be
possible to add the spoken-language specific features to the possible
content of <seg> or add <pause> and other spoken-language elements to a
model like model.phrase, such that they may be used within <seg>
elements. Another method we thought of would be to stick to <anchors>.
But then, why shouldn't a <seg> contain a <pause>? And if it shouldn't,
what could one do to circumvent this?
Highly interested in your opinions,
with kind regards,
| Stefan Majewski | Department of English, University of Vienna |
| VOICE Corpus | Spitalgasse 2-4, Universitätscampus AAKH, Hof 8 |
| | A-1090 Vienna |
| Research Ass.(IT)| Phone: +43 1 4277 424 46 |