Print

Print


Hi Piotr,

You're beginning to convince me about an attribute on <textLang>. Maybe 
alongside @mainLang and @otherLangs, we could have @sourceLang?

Or should it be @sourceLangs, since a text might have multiple 
original-language sources from which it's translated?

Cheers,
Martin

On 2018-10-23 3:23 a.m., Piotr Bański wrote:
> Hi Martin,
> 
> I don't think what I have suggested has any deep potential, actually. 
> For "deep" stuff, you would want to e.g. sub-type meaning equivalence 
> relationships and have much more flexibility than a single attribute can 
> offer. Essentially, you'd go for triples, whether clad in some angle 
> brackets or not.
> 
> What I have suggested is rather simplistic, actually (I feel). It offers 
> the kind of simplicity that is needed for rapid production of large text 
> resources for linguistic processing (as we mentioned in the Tokyo talk), 
> and if it also addresses your simple needs, then maybe it's worth 
> considering as a synergistic proposal.
> 
> Best,
> 
>     Piotr
> 
> On 10/22/18 6:55 PM, Martin Holmes wrote:
>> Hi Piotr,
>>
>> If we were going to be attempting any kind of deep dive into the 
>> source or the mechanics of translation, then this would make a lot of 
>> sense, but we don't have the resources for that. In some cases, I 
>> suspect even the claims themselves are spurious or satirical. So what 
>> we're doing is not respectable linguistics by any stretch of the 
>> imagination; the sort of question we imagine (eventually) asking of 
>> the data would be "how many poets between date X and date Y claim to 
>> be translating from German?"
>>
>> Cheers,
>> Martin
>>
>> On 2018-10-22 9:40 a.m., Piotr Bański wrote:
>>> Hi Martin,
>>>
>>> Great to see you address this concern, this information is also very 
>>> useful in e.g. parallel corpora.
>>>
>>> You are currently postulating extending the content model of 
>>> `<derivation>` (#1830) in order to be able to construct a relation 
>>> (of derivation) that has the property of "translation" and that 
>>> points to a virtual `<bibl>` object, whose only postulated property 
>>> is having been written in some language. I do see some potential 
>>> advantages of that (though the required verbosity is somewhat scary). 
>>> And as Conal says, it feels RDF-y (I'm not saying that that's good or 
>>> bad; it just does).
>>>
>>> I wonder if you have considered a slightly different way of tackling 
>>> this issue, by treating the original language as a property of the 
>>> resulting text (arguably expressed by e.g. structural interference of 
>>> the original in the result, the possible presence of calques and so 
>>> on). If that were an acceptable approach then maybe just another 
>>> attribute of `<textLang>` would cut it for you (at least in the 
>>> proverbial 80 % of cases)?
>>>
>>> I mean, both would need a ticket and potential extension of the 
>>> Guidelines, but the latter is more compact and I know I could try to 
>>> convince (some, maybe many) colleagues to use such a device, while I 
>>> sense a difficulty in pushing across the idea of exploding the markup 
>>> even more only to encode this sort of property.
>>>
>>> Best,
>>>
>>>    Piotr
>>>
>>>
>>>
>>>
>>> On 10/22/18 6:02 PM, Martin Holmes wrote:
>>>> Hi Thomas,
>>>>
>>>> I'm trying to find a solution that uses language codes (à la BCP 47) 
>>>> values in a parsable way, so I want to avoid an ad-hoc note-type 
>>>> solution.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>> On 2018-10-22 2:26 a.m., Thomas Stäcker wrote:
>>>>> Sorry, now also to the list.
>>>>> Thomas
>>>>>
>>>>> Am 22.10.2018 um 11:25 schrieb Thomas Stäcker:
>>>>>> Martin,
>>>>>> from a bibliographical angle I don't see any necessity to refer to 
>>>>>> a different bibliographic item. It would suffice to note that your 
>>>>>> text is a translation from the Greek. This can be done by adding a 
>>>>>> bibliographical note such as <bibl><title>Selected Little-known 
>>>>>> Greek Poems</title><note>translated from the Greek<note></bibl>. 
>>>>>> See e.g. the "translation note" in 
>>>>>> https://www.loc.gov/item/12032596/.
>>>>>> Best,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> Am 22.10.2018 um 04:51 schrieb Martin Holmes:
>>>>>>> Hi Conal,
>>>>>>>
>>>>>>> I take your point, but I think my unease derives from the fact that:
>>>>>>>
>>>>>>> <relatedItem type="translatedFrom">
>>>>>>>
>>>>>>> is ad-hoc (there's no such standardized, recommended or sample 
>>>>>>> value for relatedItem/@type), while
>>>>>>>
>>>>>>> <derivation type="translation">
>>>>>>>
>>>>>>> is there in the spec, as a sample value. If <derivation> allows 
>>>>>>> me to specify that something is a translation, why can't I 
>>>>>>> specify what language it was translated from in the same place?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Martin
>>>>>>>
>>>>>>> On 2018-10-21 3:56 a.m., Conal Tuohy wrote:
>>>>>>>> Hi Martin
>>>>>>>>
>>>>>>>> I agree it is a little unusual to have a <bibl> containing 
>>>>>>>> nothing but a <textLang>, but personally I don't see anything 
>>>>>>>> wrong with that if it accurately reflects the (unusual) state of 
>>>>>>>> your knowledge about this particular text, which I think you 
>>>>>>>> said you know nothing about except that it was in Greek. My 
>>>>>>>> conservative inclination is not to add new syntax to TEI where 
>>>>>>>> it is already adequate (as it seems to me to be). Of course, in 
>>>>>>>> the event that you DID find out more about the text which was 
>>>>>>>> the source for the translation, and you had used a <bibl> to 
>>>>>>>> describe it, then you could enrich that description very easily 
>>>>>>>> simply by inserting new elements into that <bibl>.
>>>>>>>>
>>>>>>>> Personally I don't find it too prolix or "roundabout"; but 
>>>>>>>> perhaps this is because this style is more akin to the way it 
>>>>>>>> would be encoded in RDF, which is something I've come to feel 
>>>>>>>> very comfortable with.
>>>>>>>>
>>>>>>>> <bibl>
>>>>>>>>     <title>Selected Little-known Greek Poems</title>
>>>>>>>>     <textLang mainLang="en"/>
>>>>>>>>     <relatedItem type="translatedFrom">
>>>>>>>>        <bibl>
>>>>>>>>           <textLang mainLang="grc"/>
>>>>>>>> </bibl>
>>>>>>>> </relatedItem>
>>>>>>>> </bibl>
>>>>>>>>
>>>>>>>> The clincher, for me, is that <bibl> and related elements 
>>>>>>>> already provide a standard way to encode many kinds of 
>>>>>>>> bibliographic metadata including the language of the source 
>>>>>>>> text. Personally I don't see what you stand to gain by adding 
>>>>>>>> another alternative method whose utility would be more restricted.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> Conal
>>>>>>>>
>>>>>>>> On Sun, 21 Oct 2018 at 06:46, Martin Holmes <[log in to unmask] 
>>>>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>>>>
>>>>>>>>     Hi Conal,
>>>>>>>>
>>>>>>>>     On 2018-10-19 11:18 p.m., Conal Tuohy wrote:
>>>>>>>>      > The element <relatedItem> may be helpful: if you have a 
>>>>>>>> <bibl> which
>>>>>>>>      > describes your English-language text, it could use a
>>>>>>>>     <relatedItem> to
>>>>>>>>      > point to a <bibl> which describes the original text of 
>>>>>>>> which it is a
>>>>>>>>      > translation (even if only to the extent of naming the 
>>>>>>>> language,
>>>>>>>>     with a
>>>>>>>>      > <textLang> element). e.g.
>>>>>>>>      >
>>>>>>>> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#index-egXML-d53e48477 
>>>>>>>>
>>>>>>>>
>>>>>>>>     That seems a really roundabout way to do something that really
>>>>>>>>     should be
>>>>>>>>     quite simple, don't you think? I <bibl> containing nothing 
>>>>>>>> but a
>>>>>>>>     <textLang> would be a bit weird:
>>>>>>>>
>>>>>>>>     <bibl><textLang mainLang="la"/><bibl>
>>>>>>>>
>>>>>>>>     And it seems to me that <derivation> is where this should 
>>>>>>>> really go.
>>>>>>>>
>>>>>>>>     I think I'd like to see the definition and of <lang> 
>>>>>>>> expanded so
>>>>>>>>     that it
>>>>>>>>     isn't constrained to linguistic contexts alone; and also I 
>>>>>>>> think it
>>>>>>>>     needs an attribute on which the language can be specified 
>>>>>>>> using BCP 47,
>>>>>>>>     rather than encouraging the use of ad-hoc textual language 
>>>>>>>> descriptors
>>>>>>>>     in the content of the element, as we do now.
>>>>>>>>
>>>>>>>>     Cheers,
>>>>>>>>     Martin
>>>>>>>>
>>>>>>>>      >
>>>>>>>>      >
>>>>>>>>      >
>>>>>>>>      > On Sat, 20 Oct 2018 at 01:45, Martin Holmes <[log in to unmask]
>>>>>>>> <mailto:[log in to unmask]>
>>>>>>>>      > <mailto:[log in to unmask] <mailto:[log in to unmask]>>> wrote:
>>>>>>>>      >
>>>>>>>>      >     Hi all,
>>>>>>>>      >
>>>>>>>>      >     We're encoding some poems in English, some of which are
>>>>>>>>     translations of
>>>>>>>>      >     original texts in other languages. We don't 
>>>>>>>> necessarily know
>>>>>>>>     the source
>>>>>>>>      >     text ("Translated from the Greek" might be the only 
>>>>>>>> info we have,
>>>>>>>>      >     and we
>>>>>>>>      >     don't have the resources to chase down all the actual
>>>>>>>>     original sources,
>>>>>>>>      >     assuming they still exist). But we'd like to include
>>>>>>>>     information about
>>>>>>>>      >     the language from which the translation was made, 
>>>>>>>> using IANA
>>>>>>>>     language
>>>>>>>>      >     subtag codes, somewhere in the header. (Each poem 
>>>>>>>> gets its
>>>>>>>>     own TEI file
>>>>>>>>      >     with its own header.)
>>>>>>>>      >
>>>>>>>>      >     I think the obvious place to do this is:
>>>>>>>>      >
>>>>>>>>      >     <derivation type="translation">[something in
>>>>>>>>     here...]</derivation>
>>>>>>>>      >
>>>>>>>>      >     The <lang> element looks like it should do the job 
>>>>>>>> here, but it
>>>>>>>>      >     seems to
>>>>>>>>      >     be restricted to "etymological or linguistic" uses, 
>>>>>>>> which
>>>>>>>>     isn't quite
>>>>>>>>      >     right; and in any case, adding @xml:lang to <lang> would
>>>>>>>>     apply it to
>>>>>>>>      >     the
>>>>>>>>      >     content of the <lang> tag itself.
>>>>>>>>      >
>>>>>>>>      >     Has anyone dealt with this? Do you have any 
>>>>>>>> suggestions? Do
>>>>>>>>     we need
>>>>>>>>      >     something like <origLang>, analogous to <origDate> and
>>>>>>>>     <origPlace>, to
>>>>>>>>      >     record the language of origin of a text which is a 
>>>>>>>> translation?
>>>>>>>>      >
>>>>>>>>      >     Cheers,
>>>>>>>>      >     Martin
>>>>>>>>      >
>>>>>>>>      >
>>>>>>>>      >
>>>>>>>>      > --
>>>>>>>>      > Conal Tuohy
>>>>>>>>      > http://conaltuohy.com/
>>>>>>>>      > @conal_tuohy
>>>>>>>>      > +61-466-324297
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Conal Tuohy
>>>>>>>> http://conaltuohy.com/
>>>>>>>> @conal_tuohy
>>>>>>>> +61-466-324297
>>>>>>
>>>>>> -- 
>>>>>> ***************************************
>>>>>> Prof. Dr. Thomas Stäcker
>>>>>> Direktor der
>>>>>> Universitäts- und Landesbibliothek Darmstadt
>>>>>> Magdalenenstr. 8
>>>>>> 64289 Darmstadt
>>>>>> +49 (0)6151 16-76200
>>>>>> [log in to unmask]
>>>>>
>>>>> -- 
>>>>> ***************************************
>>>>> Prof. Dr. Thomas Stäcker
>>>>> Direktor der
>>>>> Universitäts- und Landesbibliothek Darmstadt
>>>>> Magdalenenstr. 8
>>>>> 64289 Darmstadt
>>>>> +49 (0)6151 16-76200
>>>>> [log in to unmask]
>>>>>