This would certainly work with P4, if you're willing to accept the
premise that "English translated from Slovenian" is a different
"language" from "English" per se. P4 makes no firm guarantees about the
values you use for language identification. This is not so in P5 however.
In P5 you'd have to do something like "x-EN-SL" to show that the whole
thing is "privately defined". Or maybe you could persuade the world's
characterset geeks that EN-s-ENSL was meaningful. See further discussion
at http://www.tei-c.org/P5/Guidelines/CH.html#CHSH
Tomaz Erjavec wrote:
> Hi,
> another track could be that the language ids are extended, and you
> have e.g. (taken from the http://nl.ijs.si/elan/ corpus):
>
> <language id="sl-en">Translation from Slovene to English</language>
> <language id="en-sl">Translation from English to Slovene</language>
> <language id="sl">Slovene</language> <!--SLAVIC-->
> <language id="en">English</language> <!--GERMANIC-->
>
> An objection would be that the processing software couldn't easily
> know that 'sl-en' is in fact 'en'; but if you add a corresp (sameAs?),
> e.g.
>
> <language id="sl-en" corresp="en">Translation from Slovene to English</language>
>
> then the complexity for processing an element marked with @lang
> is only a little greater:
>
> <xsl:if test="@lang=$lang or id(@lang)/@corresp=$lang">...</xsl:if>
>
> I didn't test the above, so I can only hope it actually works..
> Still, I can see the objection that this would break generic
> stylesheets that rely on the lang attribute indicating the actual
> language of the element.
>
> Best,
> Tomaz
>
>
> <
> Peter Flynn writes:
> > On Wed, 2006-01-04 at 15:36, Lou Burnard wrote:
> > > Peter Flynn wrote:
> > >
> > > >>I think the only way to do this cleanly with P4 would be to add a <note>
> > > >>to the <notesStmt> saying something like "Translated from the Spanish"
> > > >>or whatever.
> > > >
> > > >
> > > > In the P4 interim, though, I need a machine-identifiable location (eg
> > > > an element or an attribute) for the language code, to assist in the
> > > > machine collation of the corpus.
> > > >
> > >
> > > hmm... how about
> > >
> > > <note>Translated from the <name xml:lang="es">español</name></note>
> >
> > That would certainly do it, although it's already clear from other text
> > that it's a translation. I just need something like source="es".
> >
> > ///Peter
>
>
|