Hi Martin and Laurent,
Thanks for helping me clarify this --
On 2010-11-18 18:27, Laurent Romary wrote:
> :-) The two suggestions I made at the conference.
Indeed :-) And let me try and answer in a better organized way this
time, hopefully.
> Le 18 nov. 10 à 17:53, Martin Holmes a écrit :
>
>> Hi Piotr,
>>
>> I wasn't at your presentation, so I'm coming in from a position of
>> ignorance here, but it seems to me that you're using <tagUsage> in
>> ways that could also be achieved through:
>>
>> - use of schema constraints (couldn't you avoid schematron completely
>> just by encoding these value lists in the ODD?)
No, not really. I don't know the value lists in advance, they are not
even language-dependent -- they are dictionary-dependent. This is fully
dynamic, FreeDict has now 74 dictionaries and new ones are coming, and
some of the old ones will hopefully get upgraded, which should mean
modifications in their gramGrp. There's no way for me to catch that
variation at the level of ODD.
In the ODD, I can only provide the (Schematron) infrastructure for
looking at the tagUsage "plugin" in each header, and verifying the usage
of gramGrp/* against the values that the "plugin" lists, and each such
list is potentially unique (yes, even for, say, English-Polish and
English-French dictionaries, for two reasons: (1) one may e.g. have
<pos>vt</pos>, and the other may use <pos>v</pos><subc>trans</subc> for
the same transitive verb entry, and (2) Polish and French differ, among
others, in the inventory of gender values, and the equivalents may have
their own gramGrps, which also have to be regularized in their own
tagUsage plugins).
>> - use of feature structures. The kind of data you're encoding here can
>> also be done with feature structures -- we have a dictionary project
>> that's doing that. They do tend to be a bit verbose in actual use,
>> though.
By all means -- I have gone for full abuse (or: adaptation of the
existing element content with no regard to its intended semantics), but
I can imagine an <fs> instead of <list> easily. This appears to me to be
a cosmetic problem at this very point, because it doesn't affect my
general question: am I violating all kinds of good TEI taste in using
tagsDecl and tagUsage for such purposes? Should I be (ab)using some
other part of the header, perhaps? (and if so, which?) Or is this kind
of new functionality OK where I've put it, after all?
Best regards,
Piotr
>> On 10-11-18 06:45 AM, Piotr Bański wrote:
>>> Dear All,
>>>
>>> The purpose of this e-mail is to probe the general sentiments concerning
>>> my usage of tagUsage and to gather the bits of feedback that I missed
>>> after my TEI-MM presentation.
>>>
>>> Context: FreeDict, a project hosting numerous diverse bilingual
>>> dictionaries that badly need common constraints. Some of the constraints
>>> refer to the usage of gramGrp children. In particular, I have made it so
>>> that if you choose to use e.g.<pos>, its contents have to be uniform
>>> throughout the dictionary, and restricted to what you enumerate in your
>>> <tagUsage>. Example:
>>>
>>> <tagUsage gi="pos">
>>> <list>
>>> <item ana="FreeDict_ontology.xml#f_pos_noun">n</item>
>>> <item ana="FreeDict_ontology.xml#f_pos_verb">v</item>
>>> <item ana="FreeDict_ontology.xml#f_pos_imit">imit</item>
>>> </list>
>>> </tagUsage>
>>> <tagUsage gi="gen">
>>> <list>
>>> <item ana="FreeDict_ontology.xml#f_gen_fem">f</item>
>>> <item ana="FreeDict_ontology.xml#f_gen_masc">m</item>
>>> </list>
>>> </tagUsage>
>>>
>>> In this particular dictionary, three values are possible for<pos>, and
>>> two for its sister<gen>. If others appear, Schematron complains. The
>>> @ana attributes are a separate part of the general story: they align the
>>> values that the particular dictionary uses ("m", "msc", "masc", etc.) to
>>> a single reference value (in this case, "masculine").
>>>
>>> Question: how outraged are you after looking at the above? My point in
>>> the TEI-MM presentation was that this particular decision might be
>>> counted as re-use of tagUsage[1] rather than its *ab*use.
>>>
>>> Lou voted for the latter, and suggested that I am confusing the XML
>>> sense of "tag" (roughly, "label for an XML element")[2], and the
>>> linguistic sense (= "label for various grammatical features"). But it
>>> seems to me that whether I do is a matter of perspective: indeed I
>>> regularise the usage of linguistic tags (for part of speech and gender),
>>> which happen to be the content of XML elements<pos> and<gen>. Thus, I
>>> also regularise the usage of TEI tags/elements "pos" and "gen" in this
>>> particular dictionary. Is this enough to defend my handling of tagUsage
>>> as *re-*use? In other words, is the information on the content of<pos>
>>> and its kin completely out-of-place under tagsDecl?
>>>
>>> I got a suggestion for an alternative placement, and I thought I heard
>>> "appInfo", but now I think I must have misunderstood: I don't see how
>>> appInfo[3] could serve my purpose, which has nothing to do with
>>> applications modifying dictionaries. May I ask the person who suggested
>>> this (it may have been Lou) to possibly elaborate, or more likely to
>>> correct my recollection of the potential alternative placement?
>>>
>>> Many thanks in advance,
>>>
>>> Piotr
>>>
>>> [1]:
>>> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-tagUsage.html
>>>
>>> [2]: I'm ignoring the sense implied in terms "opening tag" and "closing
>>> tag", which refer to the physical markup. It seems to me that tagsDecl
>>> and tagUsage refer to the more abstract sense of "identifier".
>>>
>>> [3]:
>>> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-appInfo.html
>>> .
|