Many thanks for both replies.
@copyOf would work fine for my purposes, but <ptr> seems more in line
with other de facto and proposed standards for the encoding of
linguistic units, including TIGER-XML, SynAF and LAF (their <edge>),
as well as PAULA (its <rel>). Would there be any specific reason not
to use <ptr> here?
As to <seg> instead of <w>, that was my initial idea, but I reasoned
that more specific elements should rather be used, if available, and
it looked like the semantics of <w> is more ore less right ("<w>
(word) represents a grammatical (not necessarily orthographic) word").
For the same reason I was going to use <phr> at the syntactic layer,
and <orgName>, <persName>, <date>, etc., at the named entity layer.
But if using <seg type="lex">, <seg type="syntactic_group">, etc.,
seems kosher here, I'll go for it, esp. that it solves the problem of
<ptr> not available for <w> in the standard and it leads to more
uniform set of schemata for various linguistic layers.
Thanks again, best,
Lou Burnard <[log in to unmask]>:
> I agree with Laurent's proposal, except that I feel a bit uneasy about
> tagging the multi-word phrase "will have done" as a single <w>
> Although I can see the force of wanting to use a single element for
> the unit of analysis, whether or not it is realised as multiple
> tokens, this seems somewhat inconsistent with the definition of <w>,
> as a "word-like" unit. I think I'd prefer to use a <seg type="lex"
> lemma="do"> to wrap the whole thing. Using a different element would
> also enable me to define in my ODD that an <fs> is required as well as
> pointers to the content.
> On another question, won't the same <fs> be required for many
> different words? in which case it might be better to invoke that
> definition by reference as well, using e.g. the @ana attribute.
> To answer the original question, however, I am not sure why <ptr> is
> not allowed within <w> and will investigate later. It doesn't seem on
> the face of it unreasonable, if you accept that <w> is meant to
> contain just low level word forms, orthographically defined. Remember
> that all these segmentation elements (phr, w, m, c etc) are
> specialisations of <seg> for particular rather narrowly defined
> You shouldn't hesitate to propose a differently narrowly-defined
> specialisation element, or use <seg>.
> Laurent Romary wrote:
>> Hi Adam,
>> Would not it be a better recommended practice to make the word
>> structure you want explicit and use @copyOf in your example as
>> <w lemma="do">
>> <!-- desc of this word -->
>> <w copyOf="tokens.xml#seg3"/>
>> <!-- will -->
>> <w copyOf="tokens.xml#seg4"/>
>> <!-- have -->
>> <w copyOf="tokens.xml#seg5"/>
>> <!-- done -->
>> Best wishes,
>> Le 29 juil. 09 à 13:44, Adam Przepiorkowski a écrit :
>>> Dear All,
>>> Some analytical elements, incl. <phr>, allow for <ptr> in their
>>> content model, but other, e.g., <w>, do not. Is there a particular
>>> reason for that?
>>> Adam P.
>>> P.S. The background of this question is that I would like <w>ords at
>>> one layer of annotation to make stand-off reference to potentially
>>> smaller token-like units at a different layer, and I envisage typical
>>> <w> content to look like this:
>>> <w lemma="do">
>>> <fs> <!-- desc of this word --> </fs>
>>> <ptr target="tokens.xml#seg3"> <!-- will -->
>>> <ptr target="tokens.xml#seg4"> <!-- have -->
>>> <ptr target="tokens.xml#seg5"> <!-- done -->
>>> I know I can use <seg type="word"> this way, but why use <seg> when
>>> the more specific <w> is available?
>>> Adam Przepiórkowski ˈadam ˌpʃɛpjurˈkɔfskʲi
>>> http://nlp.ipipan.waw.pl/ ___ Linguistic Engineering Group
>>> http://korpus.pl/ _____________ IPI PAN Corpus of Polish
>>> http://nkjp.pl/ _________________ National Corpus of Polish