Print

Print


Dear Gregory,

You might consider separating the conflicting grammatical information
into physically separate layers. This can be done in the TEI, as shown
at http://nlp.ipipan.waw.pl/TEI4NKJP/

If you only expect two concurrent layers of interpretation, an approach
close to that outlined for the National Corpus of Polish (above) may be
sufficient. If you expect more complexity... well, let us know :-)

In the KorAP project, we use sets of related interpretational layers
(from tokenization all the way up to syntactic hierarchies or
dependencies and named entities), and each of them may be independently
imposed on the base text. Recasting that into pure TEI might take some
time and thinking, but I'd still look at the NCP above for initial guidance.

Best regards,

  Piotr

-- 
Piotr Bañski
Senior Researcher,
Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany
-----------------------------------------
KorAP blog: http://korap.ids-mannheim.de/

On 31/10/14 15:01, Murray, Gregory wrote:
> What is the appropriate element for marking differing interpretations of grammatical analysis?
> 
> A project I'm working on includes ancient Aramaic texts where there can be more than one way to interpret the grammatical usage of a given word or phrase. For example, on a given line of a given manuscript, one authority says that the characters bım form a prepositional phrase consisting of two words:
> 
> <w type="prep" lemma="b">b</w><w type="noun" subtype="ms-cstr" lemma="ım">ım</w>
> 
> Another authority says those characters form a compound preposition and thus a single word:
> 
> <w type="prep" lemma="bım">bım</w>
> 
> In the digital edition, we'd like to retain and indicate both interpretations.
> 
> My first thought was <choice>. In its simplest definition it seems applicable: <choice> "groups a number of alternative encodings for the same point in a text." But the discussion of <choice> occurs within section 3.4 "Simple Editorial Changes" where it is shown as a way to group corrections (<sic> and <corr>) or regularizations (<orig> and <reg>) made by an editor. Here we're not dealing with an editorial change. Also, <choice> doesn't allow <w>.
> 
> My second thought is that the situation calls for some kind of critical apparatus. As I understand it, <app> is normally used to record textual variants from multiple witnesses. In our project, there is only one witness, but we have used <app> to record differing interpretations of what a given character actually is (authority A says it's an aleph, authority B says it's a yod). Our thinking is that this use of <app> doesn't stretch its intended semantics too far, because it still pertains to establishing the text. But grammatical interpretation is a different case. Instead of a textual ambiguity, we have a grammatical one. Also, <app> doesn't allow <w>.
> 
> There's <interp>, but that seems to be a kind of annotation. I don't view this situation as making an annotation, but rather as indicating two (equally defensible) ways of marking the same characters.
> 
> Is there an element I'm overlooking? Or should I circle back to <choice> and come up with a way to use <w> within it -- such as customizing my schema, or using choice/seg/w (which is valid but feels like a hack) instead of choice/w (which is invalid)?
> 
> Many thanks,
> Greg
> 
> Gregory Murray
> Head of Digital Initiatives
> Princeton Theological Seminary Library
>