I agree with Laurent's proposal, except that I feel a bit uneasy about
tagging the multi-word phrase "will have done" as a single <w>
Although I can see the force of wanting to use a single element for the
unit of analysis, whether or not it is realised as multiple tokens, this
seems somewhat inconsistent with the definition of <w>, as a "word-like"
unit. I think I'd prefer to use a <seg type="lex" lemma="do"> to wrap
the whole thing. Using a different element would also enable me to
define in my ODD that an <fs> is required as well as pointers to the
On another question, won't the same <fs> be required for many different
words? in which case it might be better to invoke that definition by
reference as well, using e.g. the @ana attribute.
To answer the original question, however, I am not sure why <ptr> is
not allowed within <w> and will investigate later. It doesn't seem on
the face of it unreasonable, if you accept that <w> is meant to contain
just low level word forms, orthographically defined. Remember that all
these segmentation elements (phr, w, m, c etc) are specialisations of
<seg> for particular rather narrowly defined purposes.
You shouldn't hesitate to propose a differently narrowly-defined
specialisation element, or use <seg>.
Laurent Romary wrote:
> Hi Adam,
> Would not it be a better recommended practice to make the word
> structure you want explicit and use @copyOf in your example as follows:
> <w lemma="do">
> <!-- desc of this word -->
> <w copyOf="tokens.xml#seg3"/>
> <!-- will -->
> <w copyOf="tokens.xml#seg4"/>
> <!-- have -->
> <w copyOf="tokens.xml#seg5"/>
> <!-- done -->
> Best wishes,
> Le 29 juil. 09 à 13:44, Adam Przepiorkowski a écrit :
>> Dear All,
>> Some analytical elements, incl. <phr>, allow for <ptr> in their
>> content model, but other, e.g., <w>, do not. Is there a particular
>> reason for that?
>> Adam P.
>> P.S. The background of this question is that I would like <w>ords at
>> one layer of annotation to make stand-off reference to potentially
>> smaller token-like units at a different layer, and I envisage typical
>> <w> content to look like this:
>> <w lemma="do">
>> <fs> <!-- desc of this word --> </fs>
>> <ptr target="tokens.xml#seg3"> <!-- will -->
>> <ptr target="tokens.xml#seg4"> <!-- have -->
>> <ptr target="tokens.xml#seg5"> <!-- done -->
>> I know I can use <seg type="word"> this way, but why use <seg> when
>> the more specific <w> is available?
>> Adam Przepiórkowski ˈadam ˌpʃɛpjurˈkɔfskʲi
>> http://nlp.ipipan.waw.pl/ ___ Linguistic Engineering Group
>> http://korpus.pl/ _____________ IPI PAN Corpus of Polish
>> http://nkjp.pl/ _________________ National Corpus of Polish