Print

Print


On Thu, 2006-01-05 at 11:00, Tomaz Erjavec wrote:
> Hi,
> another track could be that the language ids are extended, and you
> have e.g. (taken from the http://nl.ijs.si/elan/ corpus):
> 
> <language id="sl-en">Translation from Slovene to English</language>
> <language id="en-sl">Translation from English to Slovene</language>
> <language id="sl">Slovene</language>        <!--SLAVIC-->
> <language id="en">English</language>        <!--GERMANIC-->

I looked at this, but it's open to misinterpretation as the common 
format of country-language combinations. sl-en would be "Slovenian as 
spoken in England" :-)

> An objection would be that the processing software couldn't easily
> know that 'sl-en' is in fact 'en'; but if you add a corresp (sameAs?),
> e.g.
> 
> <language id="sl-en" corresp="en">Translation from Slovene to English</language>

corresp is an IDREFS attribute, so you'd need some element somewhere to 
be the id="en" to identify "English". Not a problem, but an added layer.

For the moment I just added a "source" CDATA attribute to <language> in
our customisation layer. I think it'll do until we move to P5.

///Peter