On Thu, 2006-01-05 at 11:00, Tomaz Erjavec wrote:
> Hi,
> another track could be that the language ids are extended, and you
> have e.g. (taken from the http://nl.ijs.si/elan/ corpus):
>
> <language id="sl-en">Translation from Slovene to English</language>
> <language id="en-sl">Translation from English to Slovene</language>
> <language id="sl">Slovene</language> <!--SLAVIC-->
> <language id="en">English</language> <!--GERMANIC-->
I looked at this, but it's open to misinterpretation as the common
format of country-language combinations. sl-en would be "Slovenian as
spoken in England" :-)
> An objection would be that the processing software couldn't easily
> know that 'sl-en' is in fact 'en'; but if you add a corresp (sameAs?),
> e.g.
>
> <language id="sl-en" corresp="en">Translation from Slovene to English</language>
corresp is an IDREFS attribute, so you'd need some element somewhere to
be the id="en" to identify "English". Not a problem, but an added layer.
For the moment I just added a "source" CDATA attribute to <language> in
our customisation layer. I think it'll do until we move to P5.
///Peter
|